Exploration via information gain chooses the action that maximizes our information gain for parameters of our state dynamics . However, directly optimizing this is intractable.

A close substitute is prediction gain, defined as

where is the density including our new state. Intuitively, if our state density changed a lot, the our state is novel. This method is closely related to Density Modeling.

Alternatively, we can use variational inference. Our information gain is

where is a history of all prior transitions. If we introduce a tractable distribution

we can optimize its parameters via the ๐Ÿงฌ Evidence Lower Bound

Then, given a new transition , we update to get and use

as our approximate bonus on top of the actual reward.