Asynchronous Advantage Actor-Critic (A3C) is a setup of ๐ญ Actor-Critic with parallel training. That is, to stabilize our value function training, we can run multiple actor-critics at once with the same policy
Each instance accumulates gradients in the global model and updates it asynchronously. This is extremely similar to mini-batch gradient descent where we use multiple samples to update the model.
A2C
One weakness with A3C is that since the parallel instances are asynchronous, some may be using older parameters than others. Thus, our updates would come from different policies, making it suboptimal.
Advantage Actor-Critic (A2C) is a synchronous version of A3C that eliminates this problem. We can use a coordinator that waits for all parallel actors to finish, then update the global parameters in unison.