Theory

Logistic regression uses a similar idea as ๐Ÿฆ Linear Regression but transforms the output to probabilities in range using the sigmoid function (pictured below).

Weโ€™ll apply weights to input , then run it through the sigmoid function to get a probability for each label. In the binary case, if we let labels be , then probability is as follows.

The equation above enforces linear decision boundary at , where probabilities are equal. Furthermore, we find that the log odds

is linear on . The decision boundary makes more sense as itโ€™s equal to ,

Now, the likelihood of our data can be calculated as a product,

Itโ€™s easier to optimize the log likelihood , which simplifies to the following.

With simple logistic regression, this is a concave down function with global optimum, which can be optimized via gradient ascent; this is also equivalent to minimizing the negation, which can be seen as optimizing logistic loss.

Note

We can also view our loss as a ๐Ÿ’ง Cross Entropy loss, between the true one-hot encoded labels and probabilities generated by our model.

We can generalize logistic regression to softmax regression, which classifies multiple classes. The probability of each class is the ratio of its probability with respect to all classes, mathematically computed as

Model

Like linear regression, our model consists of weights ; the main difference is that we use the sigmoid function after apply the weights to get a probability.

For multi-class classification, weโ€™ll calculate individual probabilities for each class. With classes, use sets of weights .

Training

Given training data and labels , assume weight prior . Since thereโ€™s no closed form solution, weโ€™ll employ gradient ascent.

  1. Randomly initialize weights .
  2. Repeatedly perform gradient ascent steps; gradient is calculated as

Note that for MLE, we drop the regularization term . The actual gradient ascent update is

Prediction

Given input , calculate

If itโ€™s above threshold , then classify as ; otherwise, classify as . For multi-class, return the class that had highest probability.