Kruskal Wallis Test

What Is The Kruskal-Wallis Test?

The Kruskal-Wallis test, named after mathematicians William Kruskal and W. Allen Wallis, is a method for testing whether samples originate from the same group by comparing the medians of two ore more groups. Often, it is used when comparing two or more independent samples of equal or different sizes. The test is referred to as "non-parametric", meaning it does not specify the parameters of the distribution or is "distribution-free." The parametric equivalent to the Kruskal-Wallis test is called a one-way analysis of variance, or "ANOVA" for short. The results of the test indicate whether one sample stochastically dominates another sample (i.e. a sample's outcome changes the outcome of the following sample).

K = number of comparison groups
N = total sample size
nᵢ  = sample size in the ith group
Rᵢ  = sum of the ranks of the ith group

How Does Machine Learning Use the Kruskal-Wallis Test?

The Kruskal-Wallis test fits into the area of machine learning associated with understanding and analyzing data. For example, one may want to examine how socioeconomic status of high schoolers affects test scores. The independent variable is the socioeconomic status (working class, middle class, and wealthy) and the dependent variable is the test scores, ranging from 0-100%.

The test examines data sets to understand the interaction of samples between multiple data sets. A null hypothesis suggests that all the medians are equal, whereas an alternative hypothesis suggests that at least one the samples is different. Machine learning uses the Kruskal-Wallis test to examine whether or not there is a significant difference between groups, however the test will not tell you which groups are different. To find out which groups are different, one would need to run a Post Hoc test.