Detecting statistical dependence between random variables is a fundamental problem of statistics. The simplest scenario is detecting linear or monotonic univariate relationships, wherePearson (1895)’s r, Spearman’s of Kendall’s can serve as test statistics. Often researchers need to detect nonlinear relationships between multivariate variables. In recent years, many nonlinear statistical dependence indicators have been developed: distance-based methods such as distance or Brownian correlation (dCor) (Székely et al., 2007; Székely and Rizzo, 2009)
, mutual information (I)-based methods with different estimators(Kraskov et al., 2004; Pál et al., 2010; Steuer et al., 2002), kernel-based methods such as the Hilbert-Schmidt Independence Criterion (HSIC) (Gretton et al., 2005, 2008) and Finite Set Independence Criterion (FSIC) (Jitkrittum et al., 2016). Another dependence indicator is the Maximal Information Coefficient (MIC) (Reshef et al., 2011, 2013). It was also recently shown that the distance correlation (dCor) can be cast in the framework of reproducing-kernel-Hilbert-space (RKHS)-based statistics (Sejdinovic et al., 2013).
There’s no free lunch: any indicator will outperform any other indicator given data whose dependence structure it is better suited to detect. However, it is desirable to develop indicators that adapt to the grain of the dependency structure and to the amount of data available to maintain robust power across relationships found in real applications. Except for FSIC, the established methods are not adaptive. Some of them are sensitive to the setting of hyperparameters, or have low statistical power for detecting important nonlinear relationships(Simon and Tibshirani, 2014).
Here we propose a family of adaptive distance-based independence test statistics inspired by two ideas: (1) Representational geometries can be compared by correlating distance matrices(Kriegeskorte and Kievit, 2013). (2) We can relax the constraint of linear correlation of the distances by nonlinearly transforming distance matrices, such that they capture primarily neighbor relationships. Such a transformed (e.g. thresholded) distance matrix captures the topology, rather than the geometry. Detecting matching topologies between two spaces and will indicate statistical dependency.
We show analytically that a family of such geo-topological relatedness indicators are 0 (in the limit of infinite data) if and only if multivariate variables and are statistically independent. The geo-topological indicators are based on the distance correlation, computed after a parametrized monotonic transformation of the distance matrices for spaces and . We use an adaptive search framework to automatically select the parameters of the monotonic transform so as to maximize the distance correlation. We show that monotonic nonlinear operators like the proposed geo-topological transformation belong to a separable space that can be understood as an RKHS-based kernel indicator of dependency.
The adaptive threshold search renders the dependence test robustly powerful across a wide spectrum of scenarios and across different noise amplitudes and sample sizes, while guaranteeing (via permutation test) that the specificity is controlled at a false positive rate of 5%.
We begin our presentation in section 2, with a short overview of the distance correlation. We then introduce the geo-topological transform, the adaptive threshold-search, as well as the algorithms and theoretical background. In section 3, we empirically demonstrate that our approach consistently outperforms competing dependence measures, including dCor and HSIC, across a range of simulated benchmark data sets (from detecting univariate correlation to high-dimensional nonlinear relationships).
2 Independence Criteria and Statistical Tests
We now describe a family of adaptive independence tests for two random variables, based on the test statistics generated by an adaptive parameter-search framework with geo-topological transformation. We begin with a brief introduction to the framework and terminology of statistical hypothesis testing. Given the independent and identically distributed (i.i.d.) sample, with each row corresponding to an observation of both variables, the statistical test
is used to distinguish between the null hypothesisand the alternative hypothesis . The test can be performed by locating the test statistic in its distribution under using a permutation procedure(Gretton et al., 2008). We first introduce the distance correlation (the core component of our adaptive procedure), and then the adaptive threshold-search framework and the procedure for the geo-topological transformation.
2.1 Distance correlation
Distance covariance was introduced by Székely et al. (2007) to test dependence between and in terms of a weightedand and the product of their marginals, computed in terms of certain expectations of pairwise Euclidean distances:
where . Lyons et al. (2013) generalized this result, showing that if the metrics and satisfy strong negative type, the distance correlation in a metric space characterizes independence: and are independent.
2.2 Adaptive Distance-Based Independence Criterion (ADIC)
We propose a novel family of algorithms which extend the standard distance correlation in the following ways:
Geo-topological transformation: The algorithm performs a transformation on the distance matrix before computing the distance covariance. The transformation is intended to focus sensitivity on neighborhood relationships (emphasizing topology relative to geometry), so as to more robustly capture the mutual information.
Adaptive parameter search: Instead of manually setting the parameters for the geo-topological transformation, the algorithm adaptively selects threshold parameters, using the maximum of the distance covariance over all parameter settings as the test statistic.
We now describe an algorithmic scheme (Algorithm 1) which gives rise to a family of algorithms aiming at computing the test statistic for independence testing.
Consider the input and : the algorithm computed the shared neighbor pair inflation (snpInf) corresponding to each pair of thresholds within the combinatorial space, such that only small distances are counted as neighbors, to form an edge within the topological graph (Algorithm S3). We expect dependent multivariate association gives similar topologies and thus yielding a larger shared neighbor pairs (snp). The inflation is then computed as shared neighbor pairs over the expected shared neighbor pairs under null distribution. In the case of searching double thresholds (ADdsnpIC), only pairs with intermediate distance are counted as neighbors. The logic behind this setup is that the dependence between very proximal data points can be more likely attribute to noise, therefore by "discrediting" these edges, we have a more stable topology.
Figure S4 offers a sketch of this process against a circular relationship, where the surf maps corresponding with their heat maps demonstrate the searching process of maximal information inflation across (the below surf plot is the search after noise normalization).
2.3 Adaptive Independence Tests with Geo-Topological Transformation (ADGTIC)
Inspired by geometric topology, we propose monotonic transformations of the distance matrices associated with the two variables, such that data points with small distances are treated as identical (one collapsed node within a topological graph) and data points with large distances are considered disconnected (no matter how distant they are from each other). Rather than simply thresholding the distance matrix, which would replace a geometrical summary with a topological summary, we explore transforms that can suppress variations among small distances (which tend to be dominated by noise) and among large distances (which may not reflect mutual information between the two variables), while preserving geometrical information (which may boost sensitivity to simple relationships). We refer to these transforms as geo-topological transforms, because they combine aspects of geometry and topology. Depending on the choice of the lower and upper bound, these transforms can threshold (lower bound = upper bound) at arbitrary levels, adapting to the granularity of the dependency present in a data set. They can also, optionally, preserve geometrical information (lower bound < upper bound). Figure 1 offers a sketch of three variants of such a geo-topological transform. We investigate the advantages of these variants. The transforms are formally defined in Algorithm 2.
As the most important extension within the proposed family of adaptive independence testing, ADGTIC uses the same adaptive threshold searching framework as described for ADIC, but computes the correlation as the maximal information inflation of the distance correlations (dCor) within the combinatorial parameter searching space. Algorithm S4 describes the procedure of the inflation computation as the ration of dCor given certain thresholds over the average dCor statistics computed under null distribution given the same thresholds. We expect the best threshold to have the highest sensitivity and statistical power to be the one that has the largest inflation. Unlike the single test statistic, in ADIC, ADGTIC generates three proposed test statistics: is the supremum of dCor computed among all possible threshold pairs; gives the statistics based on the most inflated parameters; is noise-normalized before computing the inflation.
2.4 Adaptive Gaussian-Kernel Independence Criterion (AKIC)
AKIC is a kernel-based variant of the ADIC adopting a kernel-based maximized-likelihood approach. Similar to the adaptive searching framework for different thresholds, here the kernel widths were adaptively selected and applied to maximum log likelihood of the dependent association (). Given each kernel width, we computed a log likelihood for Gaussian-kernel model using leave-one-out (LOO) cross-validation procedure on the pairwise distance of and , and the kernel width with the maximum log likelihood is selected. The logic behind this procedure is that the independence (empirically defined by randomizations) increases the entropy of the joint distribution, therefore, the maximized likelihood of the joint distribution should be higher for the unrandomized distribution given dependence.
2.5 Cumulative density difference of distances (CD3)
The distance-based inflation as an indirect measure of mutual information (introduced in ADIC) can be applied to other properties of the piecewise distance matrix from the multivariate data. We formulate the method as a short-distance inflation (SDI) test: the distribution of distances in the joint space of a dependent association pattern is expected to have a greater prevalence of short distances comparing to the randomized independent pattern, implying a greater concentration of the joint distribution and smaller entropy given any dependence. SDI is computed as the integral of cumulative density within short-distance dominance range:
where the and
are empirically defined given the probability distribution of the pairwise distances.
2.6 Other variant methods for comparisons
Related to ADGTIC introduced in section 2.3, here we consider a variant that the threshold is not to select the cut-off of a specific distance value, but the cut-off of a specific fraction of the ranked data. Therefore, in order to preserve the topology, for input and , we transform the distances below percentile to be zero and the distances above percentile to be maximum. We termed them "pADGTIC" where "p" stands for "percentile".
For fair comparison, other than the nonparametric or rank-based methods, here we adopted our methods of inflation measurement to one of the popular parametric dependence measure, K-nearest neighbour mutual information estimator, such that the hyperparameter is adaptively defined. The statistic "I - kMax" selects the which gives the maximal mutual information, and the statistic "I - kInfMax" selects the which gives the maximal mutual information inflation (estimated mutual information of the data over the null distribution of the mutual information).
2.7 Properties of the Adaptive Independence Tests
2.7.1 Computational complexity
Here, we consider the most computationally demanding of the family, ADGTIC, which consist of a combinatorial threshold space and randomizations in both inflation computation and noise normalization. In the typical setup (very large , large and small ), the computational complexity of ADT is dominated by the threshold searching with two rounds of randomizations. Hence, we achieve a cost in terms of the sample size of . In the special case of the distance covariance with univariate real-valued variables, Huo and Székely (2016) achieve an cost for dCor computation, thus potentially reducing complexity for ADGTIC to .
2.7.2 Asymptotic behaviors for independence testing
is defined for in arbitrary dimensions.
and are independent.
Here we wish to extend the proof of these two properties to our adaptive approach, which involves three piecewise functions of linear transformation. For simplicity, we here considered "GT3" for our theoretical proof ("GT1" and "GT2" are equivalent linear transformation of "GT3" in the region between lower and upper bounds). Consider aCauchy sequence
in a normed vector space S, it satisfies the existence of an integer for anysuch that, for all , . The normed vector space is said to be complete if every Cauchy sequence converges to a limit in as a Banach space. Here we define as a simplified the geo-topological transformation to be a continuous non-linear bounded functional onto :
where the upper and lower thresholds defined in section 2.3 are replaced with one parameter (but the theoretical proof is equivalent in the two thresholds case with a linear transformation). From its graph, we see it "converges" to:
So we can calculate the expected value for the Euclidean distance (within ) after transformation as:
which is a monotone nonlinear transformation. In another word, we wish to apply this monotone nonlinear operator in a Hilbert Space (the original distance correlation). Here we define as the dCor calculated given a distance matrix after the proposed geo-topological transformation. If and are independent, then
Thus, the independence criterion still holds regardless of the applied functional. Then we are going to look at the threshold searching process to determine whether the independence criterion still holds for "dCorMax" (the proof for "dCorInfMax" and "nndCorInfMax" are theoretically equivalent). We here denote "dCorMax" as and define it to be .
The equation 4 is Cauchy, but the convergence is not in the norm:
which doesn’t converge to zero, but instead, it converges to in the norm as follows:
which converges to zero, showing that the space as the completion of all continuous functions (including "GT" ) on in the norm. Therefore, "dCorMax" still holds the independence criterion.
2.7.3 Relationship to the RKHS-based Statistics
For fixed , distance correlation is defined in a Hilbert Space generated by Euclidean distance matrices of arbitrary sets (samples) of points in a Euclidean space , Szekely et al. (2014), such that for each pair of elements in the linear span of where is the linear span of all distance matrices of samples , empirical inner product is defined as:
Give is a monotone nonlinear operator on a Hilbert Space , then the kernel of is still continuously defined to be valid within Hilbert Space .
Minty et al. (1962) further defined the (not necessarily linear) monotone operator as maximal if it cannot be extended to a properly larger domain while preserving its monotoneity, which in our case, is the maximum value cap in the geo-topological transformation. In Minty et al. (1962), Theorem 4 Corollary states: If is a continuous monotone operator, then exists, is continuous on its domain, and is monotone; if in addition, is continuous and maximal, and has open domain (in particular, if is continuous and everywhere-defined), then is everywhere-defined. This shows that despite the fact that the distance correlation after our proposed family of geo-topological transformation is no longer an inner product space, it is sufficient to show that a mapping
exists to transform back to the original Hilbert Space such that the kernel operations are everywhere defined valid. As an extension, this theorem applies to other possible monotone operations such as generalized logistic function and sigmoid functions. As we showed in Equation8 that this transformation is complete in the space , can still maintain the kernel properties for an inner product space which is complete (as a metric space), a Hilbert Space. ∎
It was suggested that distance-based and RKHS-based statistics are fundamentally equivalent for testing dependence (Sejdinovic et al., 2013). Here we followed their logic to explore the relationship of RKHS with our approach. According to Berlinet and Thomas-Agnan (2011), for every symmetric positive definite function (i.e. kernel) , exists an associated RKHS of real-valued functions on with reproducing kernel . Given , the kernel embedding of into the RKHS is defined as such that for all (Sejdinovic et al., 2013).
In order for the define a distance-induced kernel for , should be a semi-metric of negative type (Sejdinovic et al., 2013).
Lyons et al. (2013) showed that for testing independence based on distance, it is necessary and sufficient that the metric space be of strong negative type, which holds for separable Hilbert Spaces.
If the geo-topological transformation is a continuous monotone operator on a separable Hilbert Space (distance metric), then it defines a separable space.
A topological space is called separable if it contains a countable, dense subset. In our case, given the countable set (original distance) and a function which is surjective on (the Hilbert Space we just defined), then is finite or countable. Then we need to prove that the dense subset projected from the original distance metric (which is a dense subset and Hausdorff Space) through the geo-topological transformation is still a dense subset of the topological space: since , and from the dense property we have , and since is continuous, , then we proved that is dense in . ∎
Since is defined within a separable Hilbert Space, it is a semi-metric of negative type, and can therefore define a distance-induced kernel.
In the next section, we present empirical results comparing the proposed family of adaptive independence tests outlined above, with other alternative state-of-the-art methods.
3 Empirical Evaluation
We performed experiments on synthetic data to validate the empirical performance of ADGTIC versus the independence criteria listed in section 2. To simulate the most common nonlinear relationships, we draw samples from linear, parabolic, sinusoidal, circular, and checkerboard dependencies, as described in Table 1. In each experiment, the statistical power was defined and computed as the fraction of true datasets yielding a statistic value greater than 95% of the values yielded by the corresponding null datasets, with a theoretical guarantee that the false positive rates is below 5%. We now describe some details of the experiments:
Parameter selection: For the family of ADGTIC and ADIC, the numbers of possible thresholds for the lower and upper bounds in the geo-topological transforms are set to . The combinatorial search space for the boundary pairs are , since the threshold search itself has a complexity of . As the calculation of the statistics requires inflation computation and noise normalization each from a separate permutation test ( and ), yielding the overall complexity of , we opt to set the number of randomization used to compute dCor inflation and noise normalization to be , respectively. For AKIC, we set the number of kernel widths to search to be and the maximum full width at half maximum factor to be . For HSIC, it applies a bootstrap approximation to the test threshold with kernel sizes set to the median distances for and (Gretton et al., 2008). For CD3, the number of randomizations and bins to compute joint-space SDI is set to and . For MIC, the user-specified value was set to be as advocated by Reshef et al. (2011). For mutual information estimator, three different k () were used as in Kraskov et al. (2004).
3.1 Synthetic Data
In spirit of no free lunch in Statistics, Simon and Tibshirani (2014) stressed the importance of statistical power to detect bivariate association. In our context, the statistical power of a dependence measure is the fraction of data sets generated from a dependent joint distribution that yield a significant result (with the false-positives rate controlled at 5%). Simon and Tibshirani (2014) and Kinney and Atwal (2014) compared several independence measures and showed that dCor (Székely et al., 2007; Székely and Rizzo, 2009)
and KNN mutual information estimates(Kraskov et al., 2004) have substantially more power than MIC (Reshef et al., 2011, 2013), but adaptive approaches like ADIC were neither proposed nor tested. To understand the behavior of these adaptive dependence measures, we investigated whether their statistical power can compete with dCor, MIC, and the KNN mutual information estimates.
3.1.1 Resistance to additive noise
In the bivariate association experiments, for each of the 5 nonlinear relationships (linear, parabolic, sinusoidal, circular, or checkerboard), 50 repetitions of 200 samples were generated, in which the input sample was uniformed distributed on the unit interval. Next, we regenerated the input sample randomly in order to generate i.i.d. versions as the null distribution with equal marginals. Figure3
shows the assessment of statistical power for the competing nonlinear dependence measures as the variance of a series of zero-mean Gaussian noise amplitude which increases logarithmically over a 10-fold range; see Table1 for simulation details and the scatter plots above each heat map as the example datasets with noise of unit amplitude. The heat maps show power values computed for ADGTIC, pADGTIC, ADsnpIC, ADdsnpIC, AKIC, CD3, R, dCor (Székely et al., 2007), Hoeffding’s D (Hoeffding, 1948), rdmCor (Kriegeskorte et al., 2008), KNN estimates of mutual information with , , or Kraskov et al. (2004), MIC (Reshef et al., 2011), HSIC (Gretton et al., 2008) and the adaptive version of KNN mutual information estimates. For each relationship, the asterisks indicate that the statistic that have a noise-at-50%-power that lies within 25% of this maximum. Among all competing measures, our proposed ADIC family has good performance in non-functional association patterns.
Table 2 shows the average statistical power across different noise amplitudes for different dependence measures in the five relationships. Our proposed family of adaptive independence tests ranked the best in parabolic, sinusoidal and checkerboard relationships, and ranked among the top 5 in all five association patterns. As expected, R was observed to have optimal power on the linear relationship, but it is worth noting that all the ADGTIC or pADGTIC algorithms adapt to the linear pattern by choosing the most informative threshold pairs to reach a near optimal performance, while R shows negligible power on the other relationships which are mirror symmetric as expected. rdmCor as the correlation coefficient on the pairwise distances of the data, shows optimal power in the circular relationship, but poor performance in all others. The behaviors of dCor and Hoeffding’s D are very similar across all relationships, and maintained substantial statistical power on all but the checkerboard relationships. On all but the sinusoidal relationship, MIC with as suggested by Reshef et al. (2011) was observed to have relatively low statistical power, consistent with the findings of Simon and Tibshirani (2014) and Kinney and Atwal (2014). The overall performance of the KNN mutual information estimator using k = 1, 6, and 20 differ from case to case: larger k’s performed better in complicated relationships like checkerboard and circular pattern, but they performed poorly comparing the adaptive approaches in linear and parabolic relationships - the two relationships are more representative of many real-world datasets than other relationships. Comparing to our adaptive selection of parameters, the KNN mutual information estimator also has the important parametric disadvantage to demand the user to specify k without any mathematical guidelines, while there is no guarantee larger k’s increases the statistical power (as in sinusoidal case). As shown here with three arbitrarily set k’s, they can significantly affect the power of one’s mutual information estimates, supporting the discovery of Kinney and Atwal (2014). The adaptive versions of this measure (I - kMax and I - kInfMax) performed slightly better than arbitrarily defined k but the overall performance is not optimal.
The results also demonstrate the resilience to noise of the adaptive approach. As most real-world machine-learning problems (especially in unsupervised learning) deal with clustered patterns, the checkerboard relationship is an important benchmark for comparing adaptive versus nonadaptive approaches such as dCor, MIC, HSIC, rdmCor, CD3, Rand Hoeffding’s D. As two kernel statistics, the adaptive features enable AKIC to capture the dependence in checkerboard relationships more powerfully than HSIC, while HSIC is more robust to structured data like circular association, but only fair in performance on other patterns. With a similar setup, ADdsnpIC outperforms ADsnpIC in all relationships and ranks among the top 5 in three out of five relationships (sinusoidal, circular and checkerboard), demonstrating the usefulness of the additional lower threshold, which may help suppress noise around proximal data points.
The variety of performance levels of all these fundamentally different methods across different association patterns also emphasizes the no-free-lunch theorem. Each method performs well in some scenarios but not others. Table 3 shows results from the staircase analysis for estimating the noise level at which the power is 80%. The adaptive geo-topological approach proves quite resilient to noise. It would be also useful to compare three scenarios to evaluate all the methods: best-case scenarios, average-case scenarios and worst-case scenarios. In the best-case, we already demonstrated that the ADGTIC family performed in the top range. In the average case, they perform well among all the competing methods. In the worst-case scenarios, we consider the worst case performance as our evaluation parameter, such that, no matter what relationships (in this analysis, out of the five association patterns) and no matter what noise levels (in this analysis, across ten noise amplitudes uniform in the log scale), applying our family of methods would at least offers how much statistical power (comparing to the optimal performing method families in the same setting, as a ratio), in another word, how robust the methods are. Here we cluster all ADIC methods together, all ADGTIC methods together, all pADGTIC together, and all mutual information estimators (I for arbitrary k’s and adaptive k) together. Ranked by the worst-case statistical power, ADGTIC (0.318), MIC (0.310), ADIC (0.264), dCor (0.210), Hoeffding’s D (0.200), HSIC (0.196), AKIC (0.186), I (0.170), CD3 (0.144), pADGTIC (0.054), rdmCor (0.028), R (0.028). Ranked by the worst-case statistical power over the optimal performing power, ADGTIC (0.515), MIC (0.502), ADIC (0.427), dCor (0.340), Hoeffding’s D (0.324), HSIC (0.317), AKIC (0.262), I (0.239), CD3 (0.233), pADGTIC (0.090), rdmCor (0.039), R (0.038).
3.1.2 Robust in different sample sizes
In this bivariate association experiment, for each of the five association patterns (linear, parabolic, sinusoidal, circular, or checkerboard), 100 repetitions of observations with sample size over a 20-fold range from 20 to 400 were generated, in which the input sample was uniformed distributed on the unit interval. Table 5 shows the average statistical power across different sample sizes for different dependence measures in the five relationships. Among all the competing measures, the proposed family of adaptive independence tests demonstrated good robustness in non-functional association patterns (ranked top 1 in all but checkerboard relationship, and top 5 in all relationships). Comparing the three variants of ADGTIC, dCorMax appears more robust than the other two. The three geo-topological transforms each have their advantages for different relationship types.
3.1.3 Adaptive to combinatorial multi-dimensional dependence
In the multi-dimensional association experiments, 50 repetitions of samples were generated, such that each of the two dimensions follows either one of the 5 association patterns (linear, parabolic, sinusoidal, circular, or checkerboard) or random relationship (r), to form a combinatorial two-dimensional dependence. Table 4 shows the statistical power across 20 combinatorial dependence for different statistics. Among all, our methods are top 1 for all but sinusoidal-random (s-r) and checkerboard-random (k-r) relationships, and ranked among top 5 in all relationships. As expected, the statistical power in the pairs of single patterns (l-l, p-p, s-s, c-c, k-k) are higher than the pairs with different patterns, implying some kind of dependence interference.
3.1.4 Offers insights on granularity of dependence structure
Optimal thresholds are recorded during the bivariate association experiments with increasing noise amplitude. Table 6 and Figure 2 shows the optimal thresholds identified in 5 relationships across different noise amplitudes. While the drifts of optimal thresholds are not obvious, the differences of these thresholds in different scenarios can offer us useful insights about the data quality and dependence structure.
|pADGTIC1 - dCorMax||0.594 0.400||0.534 0.431||0.712 0.384||0.680 0.414||0.618 0.410||0.628 0.396|
|I (k=20)||0.502 0.404||0.534 0.354||0.504 0.409||0.702 0.391||0.582 0.413||0.565 0.386|
|ADGTIC1 - dCorMax||0.668 0.349||0.580 0.428||0.552 0.452||0.590 0.464||0.420 0.324||0.562 0.399|
|pADGTIC2 - dCorMax||0.484 0.399||0.366 0.374||0.616 0.415||0.688 0.414||0.616 0.435||0.554 0.408|
|ADGTIC3 - dCorMax||0.484 0.437||0.448 0.433||0.594 0.430||0.624 0.423||0.606 0.441||0.551 0.421|
|pADGTIC1 - nndCorInfMax||0.578 0.399||0.528 0.401||0.624 0.394||0.604 0.389||0.396 0.358||0.546 0.381|
|ADGTIC1 - dCorInfMax||0.674 0.351||0.582 0.399||0.546 0.422||0.502 0.480||0.408 0.321||0.542 0.392|
|ADGTIC3 - dCorInfMax||0.632 0.371||0.580 0.431||0.542 0.449||0.512 0.468||0.432 0.430||0.540 0.419|
|ADIC - dsnpInf||0.374 0.350||0.546 0.393||0.578 0.423||0.656 0.433||0.532 0.440||0.537 0.403|
|ADGTIC2 - dCorInfMax||0.636 0.382||0.578 0.427||0.500 0.432||0.540 0.466||0.380 0.364||0.527 0.408|
|pADGTIC1 - dCorInfMax||0.636 0.371||0.602 0.373||0.600 0.396||0.560 0.449||0.230 0.177||0.526 0.381|
|ADGTIC3 - nndCorInfMax||0.578 0.387||0.560 0.432||0.526 0.402||0.514 0.451||0.448 0.398||0.525 0.400|
|pADGTIC2 - nndCorInfMax||0.562 0.404||0.494 0.419||0.538 0.400||0.582 0.422||0.440 0.397||0.523 0.395|
|ADGTIC2 - dCorMax||0.566 0.418||0.546 0.439||0.500 0.447||0.614 0.453||0.376 0.333||0.520 0.411|
|I (k=6)||0.380 0.360||0.406 0.375||0.586 0.381||0.620 0.413||0.602 0.423||0.519 0.389|
|ADGTIC1 - nndCorInfMax||0.656 0.337||0.582 0.408||0.494 0.433||0.536 0.461||0.318 0.207||0.517 0.382|
|pADGTIC3 - nndCorInfMax||0.536 0.424||0.442 0.400||0.570 0.418||0.594 0.417||0.442 0.380||0.517 0.396|
|pADGTIC2 - dCorInfMax||0.624 0.375||0.504 0.416||0.542 0.391||0.530 0.424||0.214 0.168||0.483 0.379|
|pADGTIC3 - dCorInfMax||0.604 0.392||0.514 0.416||0.498 0.409||0.536 0.450||0.250 0.215||0.480 0.389|
|ADGTIC2 - nndCorInfMax||0.536 0.421||0.564 0.415||0.410 0.418||0.540 0.463||0.340 0.278||0.478 0.397|
|dCor||0.676 0.353||0.550 0.425||0.452 0.424||0.446 0.448||0.210 0.140||0.467 0.392|
|I - kInfMax||0.468 0.396||0.386 0.355||0.354 0.403||0.634 0.437||0.428 0.324||0.454 0.382|
|Hoeffding’s D||0.650 0.356||0.460 0.414||0.460 0.431||0.498 0.475||0.200 0.112||0.454 0.393|
|HSIC||0.504 0.416||0.556 0.363||0.324 0.369||0.670 0.439||0.196 0.082||0.450 0.383|
|AKIC||0.186 0.181||0.270 0.242||0.502 0.431||0.576 0.435||0.538 0.453||0.414 0.385|
|MIC||0.344 0.335||0.378 0.299||0.586 0.288||0.438 0.366||0.310 0.193||0.411 0.305|
|ADIC - snpInf||0.368 0.383||0.446 0.367||0.392 0.414||0.492 0.504||0.264 0.326||0.392 0.394|
|CD3||0.400 0.394||0.536 0.400||0.338 0.438||0.462 0.500||0.144 0.182||0.376 0.404|
|I - kMax||0.170 0.206||0.262 0.314||0.454 0.394||0.494 0.424||0.482 0.449||0.372 0.377|
|rdmCor||0.426 0.406||0.534 0.420||0.028 0.023||0.728 0.408||0.036 0.042||0.350 0.415|
|I (k=1)||0.174 0.107||0.208 0.228||0.402 0.379||0.448 0.415||0.396 0.390||0.326 0.332|
|pADGTIC3 - dCorMax||0.160 0.263||0.054 0.040||0.344 0.401||0.310 0.363||0.316 0.330||0.237 0.315|
|R||0.710 0.325||0.136 0.104||0.054 0.053||0.028 0.036||0.072 0.043||0.200 0.300|
|pADGTIC1 - dCorMax||2.657||2.411||3.975||3.856||3.081|
|ADGTIC1 - dCorMax||2.945||2.922||2.485||3.212||1.000|
|pADGTIC2 - dCorMax||1.972||1.507||3.087||4.013||3.282|
|ADGTIC3 - dCorMax||2.249||2.015||2.985||3.130||3.212|
|pADGTIC1 - nndCorInfMax||2.611||2.296||3.036||2.154||1.347|
|ADGTIC1 - dCorInfMax||3.032||2.573||2.305||2.720||1.000|
|ADGTIC3 - dCorInfMax||2.626||2.955||2.400||2.603||2.175|
|ADIC - dsnpInf||1.435||2.154||2.783||3.594||2.594|
|ADGTIC2 - dCorInfMax||2.713||2.895||2.182||2.907||1.417|
|pADGTIC1 - dCorInfMax||2.713||2.434||2.524||2.945||1.000|
|ADGTIC3 - nndCorInfMax||2.399||2.864||2.073||2.440||1.960|
|pADGTIC2 - nndCorInfMax||2.394||2.304||2.845||2.110||2.195|
|ADGTIC2 - dCorMax||2.626||2.852||2.364||3.362||1.000|
|ADGTIC1 - nndCorInfMax||2.668||2.812||2.307||2.837||1.000|
|pADGTIC3 - nndCorInfMax||2.554||1.911||2.860||2.994||1.801|
|pADGTIC2 - dCorInfMax||2.845||2.346||2.419||1.954||1.000|
|pADGTIC3 - dCorInfMax||2.837||2.337||2.272||2.783||1.000|
|ADGTIC2 - nndCorInfMax||2.434||2.854||1.830||2.864||1.080|
|I - kInfMax||1.960||1.243||1.377||3.774||1.044|
|ADIC - snpInf||1.531||1.448||1.770||2.822||1.204|
|I - kMax||1.000||1.146||1.668||2.224||2.355|
|pADGTIC3 - dCorMax||1.036||1.000||1.561||1.390||1.073|
|I (k=6)||0.88||0.98||0.98||0.88||1.00||0.96||0.82||0.94||0.98||0.92||0.88||0.92||1.00||0.98||0.98||0.94||0.88||0.94||0.98||0.94||0.939 0.050|
|I (k=1)||0.82||1.00||0.92||0.90||0.96||0.98||0.84||0.96||0.98||0.88||0.88||0.94||0.98||0.96||0.94||0.92||0.96||0.90||0.84||0.00||0.878 0.213|
|ADGTIC3 - dCorMax||1.00||1.00||1.00||1.00||1.00||1.00||1.00||1.00||1.00||1.00||1.00||1.00||1.00||0.00||0.00||1.00||0.00||1.00||1.00||0.00||0.800 0.410|
|Hoeffding’s D||1.00||1.00||1.00||1.00||1.00||1.00||1.00||1.00||1.00||1.00||1.00||1.00||1.00||0.00||0.00||1.00||0.00||0.00||1.00||0.00||0.750 0.444|
|I (k=20)||0.86||1.00||0.00||0.92||0.96||0.98||0.86||0.98||1.00||0.88||0.86||1.00||0.02||0.02||0.02||0.92||0.00||0.94||0.14||0.00||0.618 0.447|
|ADIC - dsnpInf||1.00||1.00||0.12||0.34||0.04||0.74||1.00||0.16||0.48||0.04||0.56||1.00||1.00||0.02||0.00||1.00||0.00||0.00||1.00||0.00||0.475 0.444|
|ADGTIC1 - dCorInfMax||1.00||1.00||0.42||0.00||0.02||0.62||1.00||0.00||0.00||0.02||0.68||1.00||0.02||0.00||0.74||1.00||0.00||0.00||1.00||0.00||0.426 0.456|
|ADGTIC2 - dCorMax||1.00||1.00||0.00||0.00||0.00||1.00||1.00||0.00||0.00||0.00||1.00||1.00||0.00||0.00||0.00||1.00||0.00||0.00||1.00||0.00||0.400 0.503|
|ADGTIC1 - dCorMax||1.00||1.00||0.48||0.00||0.00||0.00||1.00||0.00||0.00||0.00||1.00||1.00||0.00||0.00||0.00||1.00||0.00||0.00||1.00||0.00||0.374 0.483|
|ADGTIC3 - nndCorInfMax||1.00||1.00||0.20||0.00||0.14||0.04||1.00||0.36||0.04||0.06||0.00||1.00||0.08||0.00||0.06||1.00||0.06||0.02||0.78||0.12||0.348 0.424|
|ADGTIC1 - nndCorInfMax||1.00||1.00||0.12||0.02||0.16||0.12||1.00||0.02||0.04||0.04||0.12||1.00||0.10||0.02||0.02||0.98||0.10||0.08||0.92||0.06||0.346 0.430|
|ADGTIC2 - nndCorInfMax||1.00||1.00||0.36||0.04||0.10||0.22||1.00||0.00||0.08||0.04||0.18||1.00||0.02||0.00||0.02||0.98||0.08||0.00||0.68||0.12||0.346 0.415|
|ADGTIC3 - dCorInfMax||1.00||1.00||0.70||0.00||0.02||0.04||1.00||0.00||0.00||0.00||0.06||1.00||0.00||0.00||0.00||1.00||0.00||0.00||1.00||0.00||0.341 0.468|
|ADGTIC2 - dCorInfMax||1.00||1.00||0.52||0.00||0.00||0.04||1.00||0.00||0.00||0.00||0.04||1.00||0.00||0.00||0.02||1.00||0.02||0.00||1.00||0.00||0.332 0.463|
|ADIC - snpInf||1.00||1.00||0.00||0.04||0.04||0.18||1.00||0.00||0.10||0.00||0.00||1.00||0.00||0.00||0.00||1.00||0.00||0.00||1.00||0.00||0.318 0.460|
|ADGTIC3 - dCorMax||1.000 0.000||1.000 0.000||1.000 0.000||1.000 0.000||0.950 0.218||0.990 0.022|
|I (k=1)||0.995 0.011||0.985 0.019||0.991 0.015||0.993 0.015||0.983 0.018||0.989 0.005|
|AKIC||1.000 0.000||1.000 0.000||0.950 0.218||1.000 0.000||0.950 0.218||0.980 0.027|
|ADIC - dsnpInf||1.000 0.000||1.000 0.000||0.967 0.144||1.000 0.000||0.928 0.230||0.979 0.032|
|ADIC - snpInf||1.000 0.000||1.000 0.000||0.950 0.218||1.000 0.000||0.908 0.279||0.972 0.042|
|I (k=6)||0.992 0.018||0.982 0.024||0.939 0.211||0.995 0.010||0.942 0.216||0.970 0.027|
|ADGTIC3 - nndCorInfMax||0.998 0.009||1.000 0.002||0.969 0.135||0.986 0.041||0.889 0.214||0.968 0.046|
|ADGTIC1 - dCorMax||1.000 0.000||1.000 0.000||0.900 0.300||1.000 0.000||0.850 0.357||0.950 0.071|
|ADGTIC3 - dCorInfMax||1.000 0.000||1.000 0.000||0.950 0.218||0.932 0.215||0.825 0.285||0.941 0.072|
|ADGTIC2 - dCorMax||1.000 0.000||1.000 0.000||0.900 0.300||1.000 0.000||0.800 0.400||0.940 0.089|
|CD3||1.000 0.000||1.000 0.000||0.959 0.179||0.972 0.122||0.667 0.447||0.920 0.143|
|Hoeffding’s D||1.000 0.000||1.000 0.000||1.000 0.000||1.000 0.000||0.550 0.497||0.910 0.201|
|MIC||0.984 0.015||0.977 0.022||0.956 0.160||0.891 0.292||0.733 0.422||0.908 0.105|
|ADGTIC2 - dCorInfMax||1.000 0.000||1.000 0.000||0.911 0.269||0.905 0.280||0.724 0.341||0.908 0.113|
|ADGTIC2 - nndCorInfMax||1.000 0.000||0.997 0.013||0.917 0.235||0.950 0.162||0.639 0.338||0.900 0.150|
|ADGTIC1 - dCorInfMax||1.000 0.000||1.000 0.000||0.900 0.300||0.834 0.336||0.766 0.324||0.900 0.103|
|ADGTIC1 - nndCorInfMax||1.000 0.000||0.999 0.003||0.901 0.286||0.870 0.273||0.697 0.301||0.893 0.124|
|dCor||1.000 0.000||1.000 0.000||0.900 0.300||0.800 0.400||0.600 0.490||0.860 0.167|
|HSIC||1.000 0.000||1.000 0.000||0.850 0.357||0.950 0.218||0.500 0.500||0.860 0.210|
|I (k=20)||0.945 0.217||0.941 0.216||0.795 0.396||0.895 0.299||0.692 0.453||0.853 0.108|
|rdmCor||1.000 0.000||1.000 0.000||0.300 0.458||0.000 0.000||0.200 0.400||0.500 0.469|
|R||1.000 0.000||0.350 0.477||0.000 0.000||0.000 0.000||0.150 0.357||0.300 0.417|
|ADdsnpIC||l: 0.375, u: 0.649||l: 0.323, u: 0.566||l: 0.349, u: 0.626||l: 0.332, u: 0.614||l: 0.366, u: 0.652|
|ADGTIC1 - dCorMax||l: 0.128, u: 0.624||l: 0.092, u: 0.588||l: 0.148, u: 0.624||l: 0.120, u: 0.584||l: 0.132, u: 0.556|
|ADGTIC1 - dCorInfMax||l: 0.400, u: 0.804||l: 0.372, u: 0.800||l: 0.420, u: 0.796||l: 0.360, u: 0.732||l: 0.356, u: 0.740|
|ADGTIC1 - nndCorInfMax||l: 0.356, u: 0.752||l: 0.316, u: 0.708||l: 0.296, u: 0.712||l: 0.336, u: 0.728||l: 0.344, u: 0.736|
|ADGTIC2 - dCorMax||l: 0.040, u: 0.648||l: 0.024, u: 0.680||l: 0.100, u: 0.648||l: 0.024, u: 0.620||l: 0.128, u: 0.640|
|ADGTIC2 - dCorInfMax||l: 0.356, u: 0.776||l: 0.368, u: 0.780||l: 0.344, u: 0.748||l: 0.336, u: 0.760||l: 0.368, u: 0.804|
|ADGTIC2 - nndCorInfMax||l: 0.284, u: 0.704||l: 0.252, u: 0.688||l: 0.244, u: 0.684||l: 0.264, u: 0.776||l: 0.328, u: 0.756|
|ADGTIC3 - dCorMax||l: 0.000, u: 0.228||l: 0.024, u: 0.232||l: 0.000, u: 0.228||l: 0.000, u: 0.224||l: 0.000, u: 0.204|
|ADGTIC3 - dCorInfMax||l: 0.356, u: 0.836||l: 0.364, u: 0.752||l: 0.356, u: 0.812||l: 0.368, u: 0.800||l: 0.392, u: 0.812|
|ADGTIC3 - nndCorInfMax||l: 0.312, u: 0.720||l: 0.288, u: 0.692||l: 0.260, u: 0.720||l: 0.328, u: 0.760||l: 0.332, u: 0.682|
|pADGTIC1 - dCorMax||l: 0.000, u: 0.380||l: 0.000, u: 0.424||l: 0.020, u: 0.404||l: 0.024, u: 0.348||l: 0.012, u: 0.332|
|pADGTIC1 - dCorInfMax||l: 0.388, u: 0.756||l: 0.372, u: 0.788||l: 0.448, u: 0.816||l: 0.352, u: 0.760||l: 0.364, u: 0.748|
|pADGTIC1 - nndCorInfMax||l: 0.216, u: 0.668||l: 0.280, u: 0.704||l: 0.368, u: 0.728||l: 0.232, u: 0.652||l: 0.324, u: 0.728|
|pADGTIC2 - dCorMax||l: 0.032, u: 0.632||l: 0.000, u: 0.600||l: 0.016, u: 0.576||l: 0.004, u: 0.600||l: 0.012, u: 0.620|
|pADGTIC2 - dCorInfMax||l: 0.424, u: 0.820||l: 0.408, u: 0.807||l: 0.360, u: 0.796||l: 0.332, u: 0.788||l: 0.364, u: 0.840|
|pADGTIC2 - nndCorInfMax||l: 0.280, u: 0.756||l: 0.252, u: 0.704||l: 0.304, u: 0.684||l: 0.296, u: 0.724||l: 0.276, u: 0.696|
|pADGTIC3 - dCorMax||l: 0.000, u: 0.200||l: 0.000, u: 0.200||l: 0.000, u: 0.200||l: 0.000, u: 0.200||l: 0.000, u: 0.200|
|pADGTIC3 - dCorInfMax||l: 0.464, u: 0.864||l: 0.516, u: 0.888||l: 0.428, u: 0.872||l: 0.436, u: 0.816||l: 0.412, u: 0.816|
|pADGTIC3 - nndCorInfMax||l: 0.344, u: 0.764||l: 0.356, u: 0.740||l: 0.352, u: 0.716||l: 0.256, u: 0.660||l: 0.280, u: 0.672|
Distance matrices capture the representational geometry and can be subjected to monotonic nonlinear transforms to capture the representational topology at different granularities. We introduced a novel family of independence tests that adapt the parameters of these geo-topological transforms so as to maximize sensitivity of the distance covariance to statistical dependency between two multivariate variables. The proposed test statistics are theoretically sound and perform well empirically, providing robust sensitivity across a wide range of univariate and multivariate relationships and across different noise levels and amounts of data. The adaptive geo-topological distance-covariance approach to detecting dependence deserves further theoretical and empirical attention in future studies. The present results suggest that it might prove useful for a wide range of practical applications.
- Berlinet and Thomas-Agnan (2011) Berlinet, A. and Thomas-Agnan, C. (2011). Reproducing kernel Hilbert spaces in probability and statistics. Springer Science & Business Media.
- Gretton et al. (2005) Gretton, A., Bousquet, O., Smola, A., and Schölkopf, B. (2005). Measuring statistical dependence with hilbert-schmidt norms. In International conference on algorithmic learning theory, pages 63–77. Springer.
- Gretton et al. (2008) Gretton, A., Fukumizu, K., Teo, C. H., Song, L., Schölkopf, B., and Smola, A. J. (2008). A kernel statistical test of independence. In Advances in neural information processing systems, pages 585–592.
- Hoeffding (1948) Hoeffding, W. (1948). A non-parametric test of independence. The annals of mathematical statistics, pages 546–557.
- Huo and Székely (2016) Huo, X. and Székely, G. J. (2016). Fast computing for distance covariance. Technometrics, 58(4):435–447.
- Jitkrittum et al. (2016) Jitkrittum, W., Szabó, Z., and Gretton, A. (2016). An adaptive test of independence with analytic kernel embeddings. arXiv preprint arXiv:1610.04782.
- Kinney and Atwal (2014) Kinney, J. B. and Atwal, G. S. (2014). Equitability, mutual information, and the maximal information coefficient. Proceedings of the National Academy of Sciences, page 201309933.
- Kraskov et al. (2004) Kraskov, A., Stögbauer, H., and Grassberger, P. (2004). Estimating mutual information. Physical review E, 69(6):066138.
- Kriegeskorte and Kievit (2013) Kriegeskorte, N. and Kievit, R. A. (2013). Representational geometry: integrating cognition, computation, and the brain. Trends in cognitive sciences, 17(8):401–412.
- Kriegeskorte et al. (2008) Kriegeskorte, N., Mur, M., and Bandettini, P. A. (2008). Representational similarity analysis-connecting the branches of systems neuroscience. Frontiers in systems neuroscience, 2:4.
- Lyons et al. (2013) Lyons, R. et al. (2013). Distance covariance in metric spaces. The Annals of Probability, 41(5):3284–3305.
- Minty et al. (1962) Minty, G. J. et al. (1962). Monotone (nonlinear) operators in hilbert space. Duke Mathematical Journal, 29(3):341–346.
- Pál et al. (2010) Pál, D., Póczos, B., and Szepesvári, C. (2010). Estimation of rényi entropy and mutual information based on generalized nearest-neighbor graphs. In Advances in Neural Information Processing Systems, pages 1849–1857.
- Pearson (1895) Pearson, K. (1895). Note on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58:240–242.
- Reshef et al. (2013) Reshef, D., Reshef, Y., Mitzenmacher, M., and Sabeti, P. (2013). Equitability analysis of the maximal information coefficient, with comparisons. arXiv preprint arXiv:1301.6314.
- Reshef et al. (2011) Reshef, D. N., Reshef, Y. A., Finucane, H. K., Grossman, S. R., McVean, G., Turnbaugh, P. J., Lander, E. S., Mitzenmacher, M., and Sabeti, P. C. (2011). Detecting novel associations in large data sets. science, 334(6062):1518–1524.
- Sejdinovic et al. (2013) Sejdinovic, D., Sriperumbudur, B., Gretton, A., and Fukumizu, K. (2013). Equivalence of distance-based and rkhs-based statistics in hypothesis testing. The Annals of Statistics, pages 2263–2291.
- Simon and Tibshirani (2014) Simon, N. and Tibshirani, R. (2014). Comment on" detecting novel associations in large data sets" by reshef et al, science dec 16, 2011. arXiv preprint arXiv:1401.7645.
- Steuer et al. (2002) Steuer, R., Kurths, J., Daub, C. O., Weise, J., and Selbig, J. (2002). The mutual information: detecting and evaluating dependencies between variables. Bioinformatics, 18(suppl_2):S231–S240.
- Székely and Rizzo (2009) Székely, G. J. and Rizzo, M. L. (2009). Brownian distance covariance. The annals of applied statistics, pages 1236–1265.
- Székely et al. (2007) Székely, G. J., Rizzo, M. L., Bakirov, N. K., et al. (2007). Measuring and testing dependence by correlation of distances. The annals of statistics, 35(6):2769–2794.
- Szekely et al. (2014) Szekely, G. J., Rizzo, M. L., et al. (2014). Partial distance correlation with methods for dissimilarities. The Annals of Statistics, 42(6):2382–2412.
Appendix A Appendix
a.1 Other properties
a.1.1 Applicable for arbitrary dimensions
In this toy example for multi-dimensional association, 10 repetitions of samples were generated such that has a dimension of 2, and
has a dimension of 5. We generated 5 clusters of multi-Gaussian distributions with a constant noise level and varying variances and subpopulations (sample size in each clusters). Among the previously compared methods, only dCor, ADGTIC, ADsnpIC, ADdsnpIC, AKIC, HSIC, CD3, rdmCor, and pADGTIC can be applied to arbitrary dimensions. Other than pADGTIC3 - dCorMax (0.4), they all have fairly good power (1.0). In future work, we can extend it to any arbitrary dimensions, for example,as fMRI recordings in each voxels, as behavioural information of different categories.