What is a Cumulative Distribution Function?
A cumulative distribution function (CDF) describes the cumulative probability of any given function below, above or between two points. Similar to a frequency table that counts the accumulated frequency of an occurrence up to a certain value, the CDF tracks the cumulative probabilities up to a certain threshold.
In algebraic terms, this function provides the cumulative value from negative infinity to a random variable (X). Expressed as:
F(x) = P(X≤x)
How is Cumulative Distribution Function Used?
Besides finding the probability of a random variable below or between two points, you can find the probability of a random distribution above a particular threshold. The latter is a technique called the complementary cumulative distribution function, or tail distribution, and as is quite useful in hypothesis testing.
For a simple example, if a machine learning logistics program for a hospital used a CDF that tracked venomous snake bites patients, you could determine:
- The probability of receiving more than 12 snake bite patients per year.
- The probability of receiving less than 12 snake bite patients per year.
- The probability of receiving between 12-15 snake bite patients per year.
In this example, the hospital could more accurately predict how much anti-venom doses they should keep in stock.