Statistics is about the organization, modeling, analysis, and interpretation of data. Standard tasks include summarizing data and inferring conclusions from the data.

To model data, we commonly use probability. Within statistics, there are two classes that differ in how they interpret probabilities:

  1. Classical or Frequentist statistics see it as frequency (over many trials). To build a model, we assume the data is generated following some unknown fixed parameters , and we use our data samples to make point estimates for .
  2. Bayesian statistics see it as a degree of belief, and a model combines both the belief of the unknown parameter as well as the information provided by the data to make a more informed probabilistic estimate for .

Fundamentally, the difference between Classical and Bayesian statistics is the treatment of the unknown parameter : is some fixed unknown value, or is it a random variable?

Classical Statistics

To explain our observations, we assume a probability model that links observed data to unknown parameters ,

For example, we can have a normal distribution,

where .

In Classical statistics, we assume are fixed values, and we find them with point estimationโ€”the single best value for We define โ€œbestโ€ as the that maximizes the likelihood of generating the data we observe, so our estimate, called the maximum likelihood estimate (MLE), is

For our normal model, this is

However, because our observations are random samples, thereโ€™s some uncertainty in . The sampling distribution of is a distribution of values that takes across all possible random samples of data; it can be found with the central limit theorem or bootstrapping, and we can use it for hypothesis tests and confidence intervals.

Bayesian Statistics

In the Bayesian approach, we instead assume are random variables following their own probability distributions, and our goal is to estimate a distribution for given the data we observe.

Like above, we assume our data comes from a model with parameters ,

However, we also assume our parameters come from their own distribution,

called a prior distribution.

To estimate our distribution for given , called the posterior distribution, we use ๐Ÿช™ Bayesโ€™ Theorem,

which combines both the data model and prior . We can interpret this calculation as up-weighing the probability of that have high likelihood, thus giving us a more informed distribution on compared to the prior.

With our posterior , we can make point and variance estimations. We can also make predictions about unobserved or future data using the posterior predictive distribution,

Note that this distribution accounts for all variabilityโ€”both from the parameter values and the new data .