Boltzmann machines are a type of โšก๏ธ Energy-Based Model that defines energy function on binary variables , which is used to calculate the probability distribution

In a Boltzmann machine,

where is the weight matrix and is the bias. Intuitively, the energy looks at the variables that are and takes the negated weights between them as well as their biases. Energy is small when variables associated with high weights are active.

Latent Variables

To model hidden features that we donโ€™t know about, we split the binary variables into sets of observable units and latent units . Our energy function can then be factored into

By introducing hidden units, the Boltzmann machine becomes a universal approximator of probability mass functions over discrete variables.

A slight variation of this form gives us the ๐Ÿšซ Restricted Boltzmann Machine.

Optimization

We generally optimize the weights , , and biases and using maximum likelihood methods. One special property is that the update for a weight between two units depends only on the statistics of the two units and ; this makes the update local, unlike many other probabilistic models.