Greedy Search is a feature selection method used to optimize regularization, which has a non-convex loss function that cannot be solved with โ›ฐ๏ธ Gradient Descent. There are three main methods, each adding features one-by-one in some sort of greedy fashion; we first initialize all weights to , then perform one of the following.

Streamwise Regression

For each feature , try adding to the model and retrain; if the penalized error improves, accept this new model.

Stepwise Regression

Iterate times.

  1. For each feature , try adding to the model and retrain.
  2. Pick the feature that has lowest error; if the penalized error improves, accept this new model.

Stagewise Regression

Iterate times.

  1. For each feature , try adding to the model and retrain only the weight for ; in other words, fit on residual .
  2. Pick the feature that has lowest error.
  3. Regress to find scaling coefficient .
  4. Update our model .