What is a Conjugate Prior?
The conjugate prior is an initial probability assumption expressed in the same distribution type (parameterization) as the posterior probability or likelihood function. In the most common case of Bayesian inference, the probability and likelihood functions are essentially the same thing when assigning initial degrees of belief (prior probability/likelihood) and when updating that belief with new information (posterior probability/likelihood). However, if the prior and posterior likelihood/probability functions use different parameters, they are not considered “conjugates” and some form of normalization or integration is required to update the probability/likelihood function.
With Frequentist inference, prior probability and likelihood represent different things. Prior probability is the initial plausibility of a random outcome occurring without evaluating any new data. The parameters are chosen based upon previous frequency of occurrences and compared against random/non-random chance.
In this case, the likelihood function describes the possibility of having a specific model parameter value for the prior probability, based upon later observation of the data. The likelihood and prior probability functions are also considered conjugates if they’re expressed with the same distribution parameters.
Common Probability Distribution Parameterizations in Machine Learning:
While all probability models follow either Bayesian or Frequentist inference, they can yield vastly different results depending upon what specific parameter distribution algorithm is employed.
- Bernoulli distribution – one parameter
- Beta distribution – multiple parameters
- Binomial distribution – two parameters
- Exponential distribution – multiple parameters
- Gamma distribution – multiple parameters
- Geometric distribution – one parameter
Gaussian (normal) distribution – multiple parameters
- Lognormal distribution – one parameter
- Negative binomial distribution – two parameters
- Poisson distribution – one parameter