Skewness

What is Skewness?

Skewness is a quantifiable measure of how distorted a data sample is from the normal distribution. In normal distribution, the data is represented graphically in a bell-shaped curve, where the mean (average) and mode (maximum value in the data set) are equal.

If the mean of the data distribution is less than the mode, more of the graphed points will be to the left of the mode than the right, which is called a “negative skew.”

If the mean of the data distribution is more than the mode, more of the graphed points will be to the right of the mode than the left, which is called a “negative skew.”

How is Skewness Used in Machine Learning?

In most models, any form of skewness is undesirable, since it leads to excessively large variance in estimates. Other models require unbiased estimators or Gaussian models to function accurately. In either case, the goal is to reduce skewness to get as close as possible to a normal distribution (normalizing data), by using “transformations,” such as taking the inverse, logarithm or square roots of all the datapoints.