What are the Probability Interpretations in Machine Learning?
Quantifying the probability of an outcome, classification match of an input or confidence in a hypothesis are basic tasks for any machine model to be useful in the real world, let alone to learn and improve itself. While humans can rely on intuition and their “gut feelings,” artificial intelligence needs a more scientific way to quantify these abstract concepts into functions an algorithm can use and learn from. So for the purposes of machine learning, regardless of the specific method of calculating and interpreting probability that’s employed, all human forms of intuition can be divided into two broad categories:
Frequentist (Physical Properties) Probability:
This assumes that all variables being studied are governed by random processes, at least over a large enough data set. In such an environment, the probability of a future event occurring is governed solely by its past frequency (prior probability) of occurrence after repeating the process a large number of times under the same conditions. Frequency probability is also the underlining rationale behind rejecting or failing to reject a null hypothesis in statistical testing.
The primary advantages of this method are how incredibly fast it can calculate the “odds” of something occurring and how this approach eliminates the potential for subjective interference.
The primary downside of this simplified probability approach is that it has no mechanism to incorporate new information unrelated to frequency that might influence the outcome.
Bayesian (Evidentiary) Probability:
Bayesian probability makes initial assumptions about the odds (prior probability) that incorporate more information than just relative frequency, including even subjective guesses from human researchers. Probability is then expressed in the model not as a definitive yes or no, but rather as a percentage degree of confidence, or belief (posterior probability).
This approach resolves the issues faced with frequency-based probability and generally increases accuracy, but also has the downside of often introducing larger margins of error in the results.
In practice though, either probability approach that yields consistent, reproducible results within a small margin of error can be interpreted with either model.