Feature matching is an โ๏ธ Inverse Reinforcement Learning technique that uses a linear reward function based on features derived from the state and action. For some feature function 
The naive matching principle seeks to equalize the expected features across the expert and derived policies; that is, find 
This is similar to Imitation Learning techniques since weโre just trying to get a policy to match the states we visit in our demonstrations.
However, this approach is ambiguous since there can be many reward functions that give the same trajectory distribution and thus the same expectation. To reduce ambiguity, we can instead borrow the maximum margin idea from ๐ฉ๏ธ Support Vector Machines and seek to find some reward function that makes the expert policy much better than all other policies. That is, we want to find 
However, in complex or continuous settings, there can be policies that are slightly different from 
Borrowing the SVM primal definition, we can instead optimize the objective
where