Energy
The energy function
Info
We can also use the energy function to model
. Unlike feed-forward models that explicitly compute from , the energy function implicitly models their dependencies. By doing this, it’s possible to find multiple that have high compatibility with , which cannot be done with an explicit model.
Probability Distribution
To convert
where partition function
Info
This choice of this distribution is not arbitrary: since we don’t have any constraints on the system, we want to use a distribution that has maximum entropy. Solving the optimization problem gives us the above distribution.
Optimization
With our parameterization, we cannot directly optimize
The solution to this problem is 🖖 Contrastive Divergence. The core idea is to sample
Intuitively, this makes the training data more likely than a random sample from the model.