Hypergeometric Distribution

What is Hypergeometric Distribution?

Hypergeometric distribution is a probability distribution that is based on a sequence of events or acts that are considered dependent. Compare this to the binomial distribution, which produces probability statistics based on independent events.  

A Real-World Example

Imagine that there is an urn, with fifty colored balls in it. Twelve of them are blue, and the other 38 are red. You're planning to take ten balls out, without looking, and without putting them back in between draws. What are the odds that, in these ten balls, exactly three will be blue? 

This is the kind of question that's answered by the Hypergeometric Distribution. If the balls were being replaced, the binomial distribution formula would be used instead. If N is the total population (50, in our example), n is the number of draws (10), K is the total population that have the feature being tracked (12), then the probability that you'll get exactly 3 items that have that feature is about 27%, based on the following plug-and-play formula:


Applications in Artificial Intelligence

Statistical machine learning relies heavily on probabilistic relationships among abstract variables in multidimensional spaces. The bases of everything in statistical machine learning are probability distributions and parameter spaces. Therefore, much AI depends heavily on being able to detect, replicate, and predict patterns without the ability to “understand” these concepts, making probability algorithms and formulas essential.