Evaluating Independence and Conditional Independence Measures
Independence and Conditional Independence (CI) are two fundamental concepts in probability and statistics, which can be applied to solve many central problems of statistical inference. There are many existing independence and CI measures defined from diverse principles and concepts. In this paper, the 16 independence measures and 16 CI measures were reviewed and then evaluated with simulated and real data. For the independence measures, eight simulated data were generating from normal distribution, normal and Archimedean copula functions to compare the measures in bivariate or multivariate, linear or nonlinear settings. Two UCI dataset, including the heart disease data and the wine quality data, were used to test the power of the independence measures in real conditions. For the CI measures, two simulated data with normal distribution and Gumbel copula, and one real data (the Beijing air data) were utilized to test the CI measures in prespecified linear or nonlinear setting and real scenario. From the experimental results, we found that most of the measures work well on the simulated data by presenting the right monotonicity of the simulations. However, the independence and CI measures were differentiated on much complex real data respectively and only a few can be considered as working well with reference to domain knowledge. We also found that the measures tend to be separated into groups based on the similarity of the behaviors of them in each setting and in general. According to the experiments, we recommend CE as a good choice for both independence and CI measure. This is also due to its rigorous distribution-free definition and consistent nonparametric estimator.
READ FULL TEXT