What are Inferential Statistics?
It is easiest to talk about inferential statistics if we first understand what descriptive statistics are. Descriptive statistics describe only the data collected. Inferential statistics take data collected and make predictions based on the data instead.
For example, let’s say there is a survey done on 100 people. You ask each of these 100 people if they like to ride horses. You find that 35 of the 100 people, or 35% answer that yes, they like to ride horses. A chart of this data would be descriptive statistics because it only uses the data collected to create the chart. If we were to infer that this data collection indicates that 35% of the American population likes to ride horses, we would be using inferential statistics. We did not survey the entirety of America and in fact, it would be difficult to do so. So instead, we infer the correct information instead using the data that we did collect.
Two Different Purposes
There are two main purposes to inferential statistics. The first is estimating parameters. We take a statistic from our collected data, such as the standard deviation, and use it to describe a more general parameter, such as the standard deviation of an entire population as we did above.
The second place that inferential statistics is useful is in hypothesis tests. These can be particularly potent when we are looking to gather data on something that can only be given to a very limited population, such as a new diabetes drug. If we want to know whether this drug will work for all diabetes patients (“entire population”), we can use the data collected to predict this (often by calculating a z-score).
Application in AI
AI is limited in the same way we are when it comes to the world and can’t look at the entire thing before it makes a decision on a particular subject. It needs to take a sample and then take an educated guess based on the data, or infer (inferential statistics) the rest of the information it needs. If the AI doesn’t infer correctly, it may accidentally base its “assumptions” based on a very limited or incomplete data set based on a bias.