Belief propagation is an inference algorithm to calculate marginals of a ๐Ÿ—ณ๏ธ Markov Random Field.

Cluster Graph

First, we need to organize our random variables into a cluster graph, which is an undirected graph with nodes representing clusters and edges associated with the sepset (separation set) .

This graph needs to satisfy two conditions:

  1. Family preservation: for each ๐Ÿช Factor , there exists a cluster such that .
  2. Running intersection: for each pair of clusters and variable , there exists a unique path between and for which all clusters and sepsets along the path contain . Equivalently, for any , the set of clusters and sepsets containing forms a tree.

Note that a MRF may already satisfy these properties; specifically, a tree MRF is a perfect candidate for the cluster graph, with each cluster being simply a node and the edgeโ€™s sepset containing its endpoints.

Bethe Cluster Graph

Alternatively, for other more complicated probabilistic graphs, a simple cluster graph called the Bethe cluster graph is commonly used. There are two types of clusters: for each , we create factor cluster and for each , we create singleton cluster . To connect them, we add edge if for all and .

Assigning Factors

After constructing the cluster graph, we assign each to a cluster that contains the factorโ€™s scope,

and define the initial potentials of a cluster as the product of its assigned factors,

This completes the initialization part of the algorithm, and we now โ€œmixโ€ these potentials together via message passing.

Message Passing

With a cluster graph, we now pass messages between clusters. The goal of a message from to is to relay information from all the other neighbors of to .

Let denote the neighbors of cluster . The information we want to send involves the potential of and the messages sent to from its other neighbors for . However, since may not include some random variables in , we marginalize over the unneeded ones, .

To find marginal probabilities, we use sum-product message passing. For MAP inference (finding the most likely assignment), we use max-sum.

Sum-Product

In sum-product, we marginalize over the product of our desired quantities. This is somewhat analogous to ๐Ÿ”ซ Variable Elimination where we combined factors via multiplication and then marginalized over a random variable.

Formally, a message from cluster to cluster is defined as

Max-Sum

Observe that to find the max, we can simply replace the summation with maximization. However, this introduces numerical instability, so we can maximize the log instead, giving us the max-sum

Algorithm

The belief propagation algorithm initializes all messages to and repeats the message passing operation again and again on all edges. Upon convergence, for each cluster , we have the belief

With sum-product, our beliefs are proportional to the marginal probabilities over their scopes,

so to answer a query for , we find a cluster for which and get

which we need to normalize by dividing by partition

With max-sum, our beliefs are the max-marginals

Note that we can answer any query by caching the results of our messages to avoid computing everything for each query.

Unfortunately, cluster graphs with loops may not lead to accurate inference. The solution to this is to use ๐ŸŒฒ Junction Trees, which guarantee that message passing converges to the true marginals, thus making belief propagate an exact inference algorithm.