Chi-squared Distribution

Understanding the Chi-Squared Distribution

The chi-squared distribution, denoted as χ²-distribution, is a fundamental probability distribution in statistics that arises in a variety of contexts, particularly in the testing of hypotheses and the construction of confidence intervals. It is a special case of the gamma distribution and is one of the most widely used probability distributions in inferential statistics, notably in chi-squared tests.

Definition of the Chi-Squared Distribution

The chi-squared distribution is defined as the distribution of a sum of the squares of k independent standard normal random variables. A standard normal random variable has a mean of 0 and a variance of 1. The parameter k is known as the degrees of freedom of the chi-squared distribution.

Mathematical Formulation

If Z₁, Z₂, ..., Z_k are k independent, standard normal random variables, then the sum of their squares,

Q = Z₁² + Z₂² + ... + Z_k²

is distributed according to the chi-squared distribution with k degrees of freedom. The probability density function (pdf) of the chi-squared distribution is given by:

f(x; k) = (1 / (2^k/2 Γ(k/2))) x^{(k/2 - 1)} e^(-x/2) for x > 0,

where Γ denotes the gamma function, which extends the factorial function to real and complex numbers.

Properties of the Chi-Squared Distribution

The chi-squared distribution has several important properties:

Shape: The shape of the chi-squared distribution depends on the degrees of freedom. As the degrees of freedom increase, the distribution becomes more symmetric and approaches a normal distribution.
Mean and Variance: The mean of the chi-squared distribution is equal to the degrees of freedom (k), and the variance is twice the degrees of freedom (2k).
Additivity: If two independent random variables are chi-squared distributed with degrees of freedom k₁ and k₂, their sum is also chi-squared distributed with degrees of freedom k₁ + k₂.
Non-Negativity: Since the chi-squared distribution is the sum of squared quantities, it only takes on non-negative values.

Applications in Statistics

The chi-squared distribution is extensively used in hypothesis testing. The most common application is the chi-squared test for independence in contingency tables and the chi-squared goodness-of-fit test. These tests allow statisticians to determine whether there is a significant association between two categorical variables or whether a sample data matches a population with a specific distribution, respectively.

In the context of these tests, the chi-squared statistic is calculated from the data, and the p-value is found by comparing the statistic to a chi-squared distribution with the appropriate degrees of freedom. A small p-value indicates that the observed data is unlikely under the null hypothesis, leading to its rejection.

Assumptions and Limitations

For the chi-squared tests to be valid, certain assumptions must be met, such as the expected frequencies in each cell of a contingency table being sufficiently large (typically at least 5). Additionally, the chi-squared distribution is only exact for continuous data; however, it is used as an approximation for discrete data in chi-squared tests.

Conclusion

The chi-squared distribution plays a pivotal role in statistical inference, particularly in tests of significance. Its properties and applications make it a cornerstone of statistical methods used across various fields, from social sciences to biology, and from market research to engineering. Understanding the chi-squared distribution and its proper application is essential for interpreting the results of statistical tests and making informed decisions based on data.