Theory

Principal Component Regression uses ๐Ÿ—œ๏ธ Principle Component Analysis to provide a regularization effect for regression. After PCA, we can apply OLS linear regression on the embeddings of .

If weโ€™re given partially-labeled data, we can use PCR to train with the labels. This is an example of semi-supervised learning, where our dataset is partially unlabeled.

  1. Calculate PCA on all , then project labeled to get and train OLS regression only on the labeled data.
  2. Unlabeled data gives some information about the structure of input space, allowing us to train a stronger regression model.

Model

PCR contains the PCA parameters, scores and loadings , and linear regression weights .

Training

Given training data , train a PCA on the inputs in ; apply to get the scores .

Weโ€™ll train a regression model with in place of , using labels as normal

Prediction

Given input , compute scores , then apply our weights to predict .