Pseudo-likelihood is a proxy for the actual likelihood term. For a probability distribution
if our partition function
This objective is motivated by the observation that ratios of probabilities cancels out the partition function,
We can use this idea to compute conditional probabilities without the partition function. Since we can break up the log likelihood via the chain rule into
we can calculate all values except the first in this manner.
However, the challenge is that in the first few terms, we would need to marginalize the denominator over large sets of variablesโany variables not conditioned on needs to be marginalized over.
Thus, instead of computing the exact likelihood via the equation above, we can instead compute the log pseudo-likelihood, defined as
Maximizing this objective is asymptotically consistent with maximizing the actual log likelihood. In the context of ๐ชฉ Probabilistic Graphical Models, due to conditional independences, conditioning on
Generalized Pseudo-Likelihood
Writing the above in terms of sets of variables
where