Markov Chain Monte Carlo (MCMC) methods conduct approximate sampling for an intractable distribution
The core idea is to repeatedly update a state
Consider infinite states
Our choice of
Convergence
In order to converge to a equilibrium distribution, we need to satisfy two constraints:
- Irreducibility: itโs possible to get from any state
to any other state with non-zero probability in a finite number of steps. - Aperiodicity: itโs possible to return to any state at any time.
Irreducibility guarantees the absence of absorbing states that present the chain from leaving, and aperiodicity prevents us from alternating back-and-forth periodically.
Theory
The convergence of the random variableโs distribution may not be intuitively obvious. The following theory explains it using the primary eigenvector of a corresponding transition matrix.
Consider a discrete variable
where
Since the columns of
Thus, all eigenvalues less than
where
Generalizing this equilibrium to continuous variables, the above equation is theoretically equivalent to the following:
Detailed Balance Condition
The above explanation shows that MCMC will converge, but we donโt know what it converges to. The following is a framework that relates our transitions
Let
is satisfied, we will converge to an approximation for
Sampling
Once the Markov chain reaches the equilibrium distribution (called โburning inโ the chain during the โmixing timeโ), we can sample infinitely from the new distribution with
Note that this is the same formula as changing the distribution of
Note
Thereโs no analytical way to check when weโve burned in the chain. Most methods involve some heuristic sampling, but the actual point where we start approximately sampling after burning in is arbitrary.
However, consecutive samples are identically distributed but not independent, so we often take every
Modes
A single chain generally hovers around a low energy region (a mode of the distribution) and rarely leaves it. As such, itโs difficult for MCMCโs sample to cover multiple modes of a complex distribution.
One solution is to modify our distribution to have higher entropy (less extreme peaks and valleys). In โก๏ธ Energy-Based Models, we can change the