Theory

GRUs tackle the vanishing gradient problem in 💬 Recurrent Neural Networks by selecting which parts of the hidden state to modify; its idea is similar to 🎥 Long Short-Term Memory, but it maintains only the hidden state and no long-term cell state.

GRUs modify the hidden state using two gates.

  1. Reset gate uses and to choose parts of to zero out.
  2. Update gate uses and to select how much of the past information from to keep.

Model

The model structure is depicted below.

Note that sigmoids (in red) are used for selection since they’re bounded from to , and tanh (in blue) is used for activations.