Belief propagation is an inference algorithm to calculate marginals of a ๐ณ๏ธ Markov Random Field.
Cluster Graph
First, we need to organize our random variables into a cluster graph, which is an undirected graph with nodes representing clusters
This graph needs to satisfy two conditions:
- Family preservation: for each ๐ช Factor
, there exists a cluster such that . - Running intersection: for each pair of clusters
and variable , there exists a unique path between and for which all clusters and sepsets along the path contain . Equivalently, for any , the set of clusters and sepsets containing forms a tree.
Note that a MRF may already satisfy these properties; specifically, a tree MRF is a perfect candidate for the cluster graph, with each cluster being simply a node and the edgeโs sepset containing its endpoints.
Bethe Cluster Graph
Alternatively, for other more complicated probabilistic graphs, a simple cluster graph called the Bethe cluster graph is commonly used. There are two types of clusters: for each
Assigning Factors
After constructing the cluster graph, we assign each
and define the initial potentials of a cluster
This completes the initialization part of the algorithm, and we now โmixโ these potentials together via message passing.
Message Passing
With a cluster graph, we now pass messages between clusters. The goal of a message from
Let
To find marginal probabilities, we use sum-product message passing. For MAP inference (finding the most likely assignment), we use max-sum.
Sum-Product
In sum-product, we marginalize over the product of our desired quantities. This is somewhat analogous to ๐ซ Variable Elimination where we combined factors via multiplication and then marginalized over a random variable.
Formally, a message from cluster
Max-Sum
Observe that to find the max, we can simply replace the summation with maximization. However, this introduces numerical instability, so we can maximize the log instead, giving us the max-sum
Algorithm
The belief propagation algorithm initializes all messages to
With sum-product, our beliefs are proportional to the marginal probabilities over their scopes,
so to answer a query for
which we need to normalize by dividing by partition
With max-sum, our beliefs are the max-marginals
Note that we can answer any query by caching the results of our messages to avoid computing everything for each query.
Unfortunately, cluster graphs with loops may not lead to accurate inference. The solution to this is to use ๐ฒ Junction Trees, which guarantee that message passing converges to the true marginals, thus making belief propagate an exact inference algorithm.