Kruskal Wallis Test

What is the Kruskal-Wallis Test?

The Kruskal-Wallis test is a non-parametric statistical test that is used to determine if there are statistically significant differences between two or more groups of an independent variable on a continuous or ordinal dependent variable. It is the non-parametric alternative to the one-way ANOVA and extends the Mann-Whitney U test when there are more than two groups. The test does not assume a normal distribution of the residuals, making it a good choice when the data does not meet the assumptions necessary for ANOVA.

When to Use the Kruskal-Wallis Test

The Kruskal-Wallis test is appropriate in the following circumstances:

You have three or more independent groups that you are comparing.
The dependent variable is continuous or ordinal.
The distributions of the populations from which the samples are drawn are not assumed to be normal.
The samples are independent of one another.
The measurement scale of the data is at least ordinal.

It is particularly useful in non-parametric statistics when the sample sizes are small or when the data is skewed and does not meet the normality assumption.

How the Kruskal-Wallis Test Works

The Kruskal-Wallis test ranks all the data from all the groups together. It then calculates the sum of these ranks for each group. The test statistic, denoted as H, is then calculated using these rank sums, the overall number of observations, and the number of observations within each group. The H statistic approximates a chi-squared distribution, which allows us to determine the p-value or the probability of observing the data if the null hypothesis were true.

The null hypothesis (H0) for the Kruskal-Wallis test is that the population medians of all of the groups are equal. The alternative hypothesis (Ha) is that at least one sample median is different from the others.

Calculating the Kruskal-Wallis Test

To perform the Kruskal-Wallis test, the following steps are taken:

Rank all the data from all groups together, assigning the average rank in case of ties.
Calculate the sum of the ranks for each group.
Use the rank sums to calculate the test statistic H using the Kruskal-Wallis formula.
Determine the degrees of freedom, which is the number of groups minus one.
Use the chi-square distribution table to find the p-value associated with the calculated H statistic and the degrees of freedom.

If the p-value is less than the chosen significance level (typically 0.05), then the null hypothesis is rejected, indicating that there is a statistically significant difference between the groups.

Limitations of the Kruskal-Wallis Test

While the Kruskal-Wallis test is useful for non-parametric data, it does have some limitations:

The test does not indicate which groups are different from each other, only that at least one group is different.
It does not take into account the magnitude of the difference, only the ranks.
It can be less powerful than ANOVA when the data actually meets the assumptions of ANOVA.

When the Kruskal-Wallis test indicates a significant result, post-hoc tests such as the Dunn's test may be used to analyze which specific groups differ from each other.

Conclusion

The Kruskal-Wallis test is a valuable tool in the statistical analysis of non-parametric data, allowing for the comparison of medians across multiple groups. It provides a method for determining if there are significant differences when data does not meet the assumptions necessary for parametric tests. However, researchers should be aware of its limitations and may need to perform additional tests to fully understand the nature of the differences between groups.