VQ-VAE is a quantized ๐Ÿ–‹๏ธ Variational Autoencoder that uses discrete latent variables trained via vector quantization (VQ). We maintain a codebook containing embedding vectors ; for a given continuous variable from the encoder, we set it to the closest embedding vector,

This gives us the posterior distribution:

Our quantized is then passed through the decoder , which aims to reconstruct our input .

Training

To train this system, we need to not only optimize the encoder and decoder but also the embedding codebookโ€”the embedding vectors should be set to accurately quantize the encoding. Thus, we have reconstruction loss, the VQ objective (which moves embedding vectors closer to encoder output ), and commitment loss (to move encoder output toward embedding vectors, preventing divergence):

where is the stop-gradient operator.

Also, since itโ€™s impossible to mathematically pass the decoder gradient past the quantization step to the encoder, we approximate it by simply copying the gradient over, skipping the quantization step.