Boltzmann machines are a type of โก๏ธ Energy-Based Model that defines energy function on binary variables , which is used to calculate the probability distribution
In a Boltzmann machine,
where is the weight matrix and is the bias. Intuitively, the energy looks at the variables that are and takes the negated weights between them as well as their biases. Energy is small when variables associated with high weights are active.
To model hidden features that we donโt know about, we split the binary variables into sets of observable units and latent units . Our energy function can then be factored into
By introducing hidden units, the Boltzmann machine becomes a universal approximator of probability mass functions over discrete variables.
We generally optimize the weights , , and biases and using maximum likelihood methods. One special property is that the update for a weight between two units depends only on the statistics of the two units and ; this makes the update local, unlike many other probabilistic models.