🧩 ACT

Action chunking with transformers (ACT) is an imitation learning system that addresses the compounding errors problem in 🐵 Behavioral Cloning with action chunking and a temporal ensemble.

Action chunking groups multiple actions together. For a chunk size , our policy predicts the next actions and executes them in sequence; this effectively decreases the task horizon by a factor of , reducing opportunities for compounding error.
Since a naive action chunking implementation would abruptly transition between action chunks, we can use a temporal ensemble: query the policy at every time step, then take a weighted average over current predictions for the next actions.

Since human behavior is stochastic, the action chunking policy can be modeled as a generative model. Specifically, we can use a CVAE with the encoder (orange) and decoder (blue) implemented as transformers.

Explorer

🧩 ACT

Backlinks

Graph View