DAgger, short for Dataset Aggregation, is an enhanced ๐ต Behavioral Cloning algorithm that enriches the dataset with mistake correction examples. Specifically, we loop the following:
- Train
from human data . - Run
to get dataset of states. - Ask an expert to label
with correct actions. - Aggregate,
, repeat.
By incorporating the policyโs empirical states into our dataset, over many iterations, weโll have the dataโs distribution of observations converge to the policyโs distribution,
thus allowing our model to learn correct responses to the states it encounters.