What is Probability in a Machine Learning Context?
Probability in deep learning is used to mimic human common sense by allowing a machine to interpret phenomena that it has no frame of reference for. While a human has a lifetime of experiences and various senses to evaluate new information, a deep learning program requires a mathematical representation of logic, intuition and “gut feelings” to function.
In programming terms, probability functions allow the program to concentrate on certain parts of the afferent system (inputs), while narrowing down the scope of possible actions for the efferent system (outputs).
Bayesian versus Frequentist Probability
Machine learning probability theory tends to divide into one of two models: Bayesian or Frequentist. Bayesian probability states that the probability of something occurring in the future can be inferred by past conditions related to the event. While that might seem like an obvious statement of fact, the Bayesian approach is at odds with another popular statistical school of thought called Frequentist Probability. Frequentist modeling states that an event’s probability is most accurately measured by its relative frequency, or how often the event occurred in past samples.
For a simplified example, a frequentist probability model would give the odds of the sun rising tomorrow at 100%, since this has occurred every day in recorded history. A Bayesian probability approach would look at deviances in various prerequisite conditions that may or may not be relevant, such as gravity fluctuations or speed of the Earth’s rotation, then assign slightly less than 100% odds.
Which answer is more “accurate” is really a philosophical debate. Though in practical terms, Bayesian probability is popular among deep learning system programmers simply because it doesn’t require a vast and well-labeled past dataset to make future predictions.