Theory

ICA is an algorithm for source separation, also known as disentanglement. For example, if we have recordings of people talking, ICA outputs audio sources, one for each voice. Generally, given an input , our goal is to find output of the same dimension so that are as independent as possible.

Info

Note that though ICA shares a similar name with ๐Ÿ—œ๏ธ Principle Component Analysis, their objectives are unrelated.

Let be the mixing matrix such that . We want to find the un-mixing matrix so that . Note that if is non-gaussian, our solution is equivalent across permutation or scaling.

First, assume independence across sources, and let them come from some non-gaussian distribution . We solve ICA by maximizing likelihood

Converting to , we have

Next, we assume that the cdf of the sources is a sigmoid, so the pdf

Then, our log likelihood is

Finally, this can be maximized via โ›ฐ๏ธ Gradient Descent with the gradient

Training

Our loss function is the negative of the log likelihood above, and we optimize with gradient descent.

Prediction

To get the sources from , we apply our learned onto .