The normal model measures continuous values and is commonly used for random variables whose distributions are unknown. For mean and variance , the model produces

with the probability

For multiple values ,

Known Variance

Since this model has two parameters, it can be hard to analyze. We first start with a easier version, where is known. With the ๐Ÿฅ‚ Conjugate prior

the posterior is

Interpretation

We can see is a prior guess for and as the uncertainty in our guess. The posterior mean is a balance between the data mean and prior mean .

Special Priors

A non-informative prior has . Note that at infinity, we get , which is an improper prior. ๐Ÿ’โ€โ™‚๏ธ Jeffreyโ€™s Prior is the same as the improper prior .

With this โ€œflatโ€ improper prior, the posterior is

Posterior Prediction

Since the Normal is a conjugate for itself, the posterior predictive distribution can also be simplified,

becomes

Unknown Variance

Now, with unknown variance , we need a joint prior over both and .

Fully Conjugate Prior

The ๐Ÿฅ‚ Conjugate joint prior is

Note that for this fully conjugate prior, the prior for is dependent on . Also, note that means .

The joint posterior is a large equation, but we can break it up into

The marginal posterior is

The conditional posterior is

Interpretation

We can interpret as the prior guess for with uncertainty , and is the prior guess for with uncertainty .

The posterior mean with a fully conjugate prior for is

so we can also interpret as the prior equivalent of a pseudo-count (like ). Similarly, we can also interpret as a pseudo-count for the posterior mean of .

Special Priors

The non-informative prior has and . At , we have the improper prior and . With an improper prior, our posterior is greatly simplified, becoming

Sampling

To sample from this distribution, we can perform the following steps:

  1. Sample .
  2. Set .
  3. Sample .

Semi-Conjugate Prior

The fully conjugate prior has dependent on , which makes it difficult to set the hyperparameter for . We can get rid of this dependency with a slightly different prior,

Now, takes the place of , making โ€™s uncertainty hyperparameter independent. However, this causes our posterior to be semi-conjugate: part of it does not lead to a standard distribution. The conditional posterior is still standard,

but the marginal posterior follows

where

Because of this non-standard distribution, we need to adjust our sampling approach. One solution is ๐Ÿงฑ Grid Sampling, which we can use to sample in lieu of the first and second steps above.