Pseudo-likelihood is a proxy for the actual likelihood term. For a probability distribution

if our partition function is intractable, we can optimize the pseudo-likelihood instead.

This objective is motivated by the observation that ratios of probabilities cancels out the partition function,

We can use this idea to compute conditional probabilities without the partition function. Since we can break up the log likelihood via the chain rule into

we can calculate all values except the first in this manner.

However, the challenge is that in the first few terms, we would need to marginalize the denominator over large sets of variablesโ€”any variables not conditioned on needs to be marginalized over.

Thus, instead of computing the exact likelihood via the equation above, we can instead compute the log pseudo-likelihood, defined as

Maximizing this objective is asymptotically consistent with maximizing the actual log likelihood. In the context of ๐Ÿชฉ Probabilistic Graphical Models, due to conditional independences, conditioning on is equivalent to conditioning on the neighborhood of , denoted , so another form of the pseudo-likelihood is

Generalized Pseudo-Likelihood

Writing the above in terms of sets of variables , we have the generalized pseudo-likelihood

where and gives the normal log likelihood and and gives the pseudo-likelihood.