Feature matching is an โ๏ธ Inverse Reinforcement Learning technique that uses a linear reward function based on features derived from the state and action. For some feature function
The naive matching principle seeks to equalize the expected features across the expert and derived policies; that is, find
This is similar to Imitation Learning techniques since weโre just trying to get a policy to match the states we visit in our demonstrations.
However, this approach is ambiguous since there can be many reward functions that give the same trajectory distribution and thus the same expectation. To reduce ambiguity, we can instead borrow the maximum margin idea from ๐ฉ๏ธ Support Vector Machines and seek to find some reward function that makes the expert policy much better than all other policies. That is, we want to find
However, in complex or continuous settings, there can be policies that are slightly different from
Borrowing the SVM primal definition, we can instead optimize the objective
where