Exploration via information gain chooses the action that maximizes our information gain for parameters of our state dynamics . However, directly optimizing this is intractable.
A close substitute is prediction gain, defined as
where is the density including our new state. Intuitively, if our state density changed a lot, the our state is novel. This method is closely related to Density Modeling.
Alternatively, we can use variational inference. Our information gain is
where is a history of all prior transitions. If we introduce a tractable distribution