1 Introduction
The detection and quantification of dependencies is essential for almost every statistical analysis. Although this is an old topic there have been recently several new contributions for the case of testing independence of multiple variables. We will focus here specifically on two: Distance multivariance [5, 2] and the multivariate HilbertSchmidtIndependenceCriterion (dHSIC) [14]. Multivariance includes Pearson’s correlation and the RVcoefficient [19] as limiting cases and unifies the bivariate distance covariance [24] and the HilbertSchmidtIndependenceCriterion [11]. For the latter dHSIC provides an alternative multivariate extension. As in the special case of Pearson’s correlation the values of these measures are influenced by the actual dependence structure and the marginal distributions.
For standardized comparisons a removal of the influence of the marginal distributions is of interest, it enables a direct comparison based on the values of a measure, rather than a comparison of the corresponding pvalues. Attempts to define dependence measures based on copulas go back at least to [25]. A recent approach of [8] is based on an distance of distribution functions (with (almost) uniform marginals; i.e., copulas). This structurally corresponds to approaches which are based on
distances of characteristic functions. In fact the estimator
of [8] corresponds formally to the approach of multivariance, and the estimator of [8] corresponds formally to the approach of dHSIC [14] (see also Section 3.3 of [2]). Therefore we will consider in this paper both: dHSIC and distance multivariance.Our approach is related to [9], where a multivariate extension of Spearman’s rho was considered and the so called distributional transform (see Section 2 below for details) was a key tool, see also [13] for the foundations.
A theoretic (population) approach to transform any dependence measure into a measure which only depends on the copula consists of two steps: 1. Transform the random variables such that their distribution function is nothing but the copula. 2. Use the new random variables in the measure.
A slight difficulty with this approach is the fact that for noncontinuous random variables the corresponding copula is in general not unique. But note, by an appropriate choice of the transformation procedure the resulting copula becomes unique.
Thus a natural approach to the corresponding sample version of the measure is: 1. Transform the samples to samples of the copula. 2. Use the new samples in the measure. For practical applications this shows that an explicit estimation of the copula is superficial, one only requires a method to transform samples into samples of the copula.
A closely related approach transforms each margin by its distribution function. This is well established standard, see e.g. [18, Section 1] for the setting of distance covariance, [6, Section 2.4] for joint distance covariance (which is closely related to total distance multivariance) and [16] for the setting of HSIC (and e.g. [21, 20]
for more recent variants). The method yields for general marginals a ’rankbased’ measure. For continuous marginals its population version coincides with the method which we propose, but for marginals with discrete components and for the sample versions our method will have the key advantage that the marginals (and their limits) are always uniformly distributed. In general, the ranks of the transformed samples in our approach coincide with ranks obtained with randomized tie breaking.
For discrete distributions (in particular if the distribution is concentrated only on a few values) a transformation to the uniform distribution will make the dependence less pronounced. Thus one can not expect that the above approach will always improve the dependence detection. Nevertheless, the examples in Section 5 show several cases where an improvement occurs.
In the next section we recall the distributional transform and provide a general framework to transform any multivariate dependence measure into a measure which only depends on the copula. The framework extends the work of [13] and [22] by focusing on the distributional properties of the corresponding sample versions. Thereafter the copula version of distance multivariance (Section 3) and dHSIC (Section 4) are defined and analysed. Examples are provided in Section 5 and technical proofs are collected in Section 7. In Section 6 a short summary and outlook is given. In a supplement^{2}^{2}2pages S ff. of this manuscript further simulations are provided, extending the discussions and parameter settings of the main examples.
2 The distributional transform – removing dependence on marginal distributions
A tool to transform any random variable to a uniformly distributed random variable is the so called distributional transform, see e.g. [22, 13] and the references given therein. For marginals with continuous distributions this becomes the classical transformation using solely the marginal distribution function.
For a univariate random variable define
(2.1) 
and let be an independent uniformly distributed random variable. Then the distributional transform of is the random variable and it has the following properties.
Theorem 2.1.
With the above notations and :

is uniformly distributed.

almost surely.

The distributional transform is invariant with respect to strictly increasing transformations, in particular translations and scalings, i.e., for any and :
(2.2) 
Random variables and their distributional transforms (with being independent and uniformly distributed) have the same copula.^{3}^{3}3Univariate random variables have the copula if for all ; the independence copula is In particular, this is the independence copula if and only if the random variables are independent. (To avoid confusion, note that if the distribution of some has a discrete component the copula describing the dependence of is not unique  but one of these is the unique copula given via the distributional transform.)
Proof.
The first two statements are classical, e.g. [13, Lemma 3] and [22, Proposition 2.1]. The third statement is a direct consequence of the following elementary identity for any strictly increasing transformation (see also [23]):
(2.3) 
The last statement is the multivariate formulation of [13, Proposition 4], see also [22, Theorem 2.2] and its proof. ∎
Remark 2.2.
The copula corresponding to the random variables transformed by the distributional transform, i.e., the distribution function of is for discontinuous marginals also known as checkerboard copula [8, Definition 1] or multilinear extension copula.
The last statement of Theorem 2.1 can be used to transform any dependence measure to a dependence measure which only depends on the copula
(2.4) 
This measure is by property 3. of Theorem 2.1 always scale and translation invariant. Note that the setting implicitly includes also the case of dependence measures for random vectors, since in this setting just some variables would be grouped on both sides. Therefore in (2.4) is also well defined for random vectors if we use the notation
(2.5) 
To get related sample versions one could follow the approach of [9] where especially the variables are replaced by their expectations. In slight variation to this we will finally replace the by random samples, i.e., we use an empirical Monte Carlo version of the distributional transform. This yields under (i.e., under the assumption that are independent) that the limits of the estimators do not depend on the marginal distributions. Nevertheless, for multivariate marginals they still depend on the marginal copulas (see Remark 2.7).
For , the univariate empirical distributional transform based on the sample sequence is
(2.6) 
and it has the following approximation property.
Theorem 2.3 (Fundamental theorem of the empirical distributional transform).
Let be independent copies of a random variable , then
(2.7) 
Proof.
The statement is a direct consequence of the fundamental theorem of statistics, the GliwenkoCantelli theorem (e.g. [7, Thm. (7.4)]), which states the almost sure uniform convergence of the empirical distribution function to the distribution function . Hence also and are uniformly approximated for all . Moreover the extra factor is restricted to a bounded set, thus (2.7) holds. ∎
As in (2.5) the notation of the empirical distributional transform will be extended to vectors, i.e., for , and a sequence in
(2.8) 
Let be samples of independent copies of for . Then define the (Monte Carlo) empirical distributional transform of to be with elements given by
(2.9) 
where all (or its components if it is in ) are samples of independent copies of a uniformly distributed random variable. By Theorem 2.3 the are approximately samples of
Now suppose for the measure in (2.4) exists an estimator then the natural candidate for an estimator for is
(2.10) 
The notation is introduced to distinguish this estimator from the theoretic estimator where one uses the true distributional transform, i.e., The latter is important since it fits directly into any distribution and limit theory for , the are just samples of independent copies of Using this relation the next theorem states that inherits the convergence properties of if the latter is uniformly continuous. But note that this does in general not hold for scaled versions of these estimators, as we will discuss afterwards.
Theorem 2.4 (preservation of consistency).
Let be uniformly continuous on in the following sense: for all exists such that for all
(2.11) 
and let be independent copies of . Then and converge to the same limit in the same mode of convergence.
Proof.
A direct consequence of Theorem 2.4 is the convergence for estimators with representations of statistic type.
Corollary 2.5 (preservation of consistency for estimators of statistic type).
If
(2.14) 
for some which is continuous on , then and converge to the same limit in the same mode of convergence.
To test independence the consistency of is usually not sufficient, but one requires the convergence in distribution of some related statistic under In the setting of
being a Vstatistic the corresponding test statistic is of the form
for some . Theoretically here again a uniformity of the convergence is required to transfer results from to . But, as the following counterexample shows, for the proof of such convergence it seems necessary to go into the details of the specific statistic.Remark 2.6 (Counterexample  for scaled Vstatistics the distributional limit is in general not preserved when replacing by ).
We give an elementary example (without explicit link to dependence measures). Let be independent copies of a continuous random variable . Due to the continuity we have: Define
then by the strong law of large numbers
since are uniformly distributed with meanand variance
. Moreover, by the central limit theorem
. But in this case the distance to the Vstatistic with replaced by does not vanish, in fact:where denotes the empirical distribution function of The above convergence is a consequence of the fact that by the central limit theorem.
This shows that in general the limits of and differ.
We aim to use the explicitly known limit behaviour of the underlying estimators without the requirement to reprove these. In this case the limit distribution of the estimator only depends on the copula and on the uniform marginals. This is (besides the margin free quantification of dependence) a further benefit for practical applications of the presented approach, since it can yield computationally faster test procedures.
Remark 2.7 (Speed advantage of methods based on the distributional transform).
The knowledge that the limit margins are uniformly distributed (by Theorem 2.1), allows to precompute required quantities, which otherwise (without the distributional transform) would require additional information prior to the test.
A basic example is a Monte Carlo pvalue derivation: Suppose one knows (as it will be the case in our setting) that the distribution of the limit of the test statistic does not depend on the marginals, and one wants to perform many tests in the setting of univariate marginals with a sample size of . Then one can easily obtain an approximate distribution function of the given estimator under (i.e., for independent marginals) based on Monte Carlo samples (with samples of size ) of independent uniformly distributed random variables. This distribution function can then be reused in every test. The quality of this approximation turns out to be good in our setting (see Figure S.2). Formally this means (in our case) that the distribution of the approximate Monte Carlo samples is close to the distribution of exact Monte Carlo samples , where are vectors with samples of independent uniformly distributed random variables and are samples of with independent components.
Note, that in general this method is not applicable if the variables under consideration are multivariate, since in this case the marginal distributions under are multivariate. Each margin then still consists of univariate uniformly distributed components, but these components can be dependent.
This section presented a general approach for the construction of dependence measures based on the copula, i.e., the new measures are invariant with respect to a change of the marginal distributions. As stated before, via the some further randomness is introduced and this might or might not blur a given dependence, see the examples in Section 5.
3 Copula distance multivariance
Distance multivariance is defined by (cf. [2])
(3.1) 
where are valued random variables with characteristic functions and is based on symmetric measures with full support on such that . For the measures there are many choices (cf. [4, Table 1]) which unify several dependence measures in the case of (see [2, Section 3]) and extend these to a multivariate setting. There is a onetoone correspondence between each measure and the real valued continuous negative definite function given by
(3.2) 
Based on the discussion in the previous section and using the notation of (2.5) we define the copula version of distance multivariance for the random variables by
(3.3) 
and analogously the copula version of total distance multivariance is given by
(3.4) 
In [2] further measures based on are discussed, e.g. the normalized multivariance and the multivariances and , also for these the corresponding copula versions can be defined analogously to the above.
The key observation for the new measures is that these inherit the following properties.
Theorem 3.1 (Characterization of independence).
(3.5) 
In particular, for random variables which are independent
(3.6) 
Proof.
The random variables are independent if and only if are independent and the same equivalence holds for any subfamily. Thus the results are a direct consequence of the corresponding properties of distance multivariance [2, Theorem 2.1]. ∎
For samples the sample version of distance multivariance is given by
(3.7) 
where each random variable is distributed according to the empirical distribution of Sample distance multivariance has also an alternative representation given in (7.1) and it can be turned into a numerical efficient estimator using distance matrices, for details we refer to [5, 2].
The following result provides everything required for the corresponding independence tests. The technical proof is postponed to Section 7.
Theorem 3.2 (Asymptotics of ).
and inherit the distributional properties of and , respectively.
In particular this yields for all random variables which are independent copies of (without any moment assumptions):
(3.9)  
(3.10)  
(3.11) 
where is a Gaussian quadratic form with
Technically, when using , the methods in [2] are only preceded by the empirical distributional transform. Therefore we omit here a further description of the tests and refer to the extended expositions in [2] and [1].
Remark 3.3 (Speed advantage  using precalculated parameters).
Due to the known uniform marginals direct pvalue estimates are possible in the case of univariate marginals using the methods described in [1, Example 5.6]
. The required values are: limit mean 1/3, limit variance 2/45 and limit skewness 8/945, moreover (not given in
[1]) the parameters required for the finite sample estimates [1, Theorems 4.15, 4.17] become , and These known values provide a considerable speed gain in comparison to the moment estimation methods of [1], see Figure 1. For multivariate marginals these parameters cannot be precomputed in general, since the required values depend on the (typically) unknown dependence of the components within the marginals (cf. Remark 2.7).4 Copula dHSIC
With the notation of the previous section (see also [2, Secion 3.3]) the multivariate HilbertSchmidtIndependenceCriterion of [14] for real valued random vectors is given by
(4.1) 
where is an independent copy of and each is bounded with . Note that this implies (since itself is a continuous negative definite function, see (3.2)), that
(4.2) 
is a positive definite kernel, cf. [2, Section 3.2]. Then the copula version of is given by
(4.3) 
and analogous to Theorem 3.1 (using [14, Propostion 1]) the measure characterizes independence.
Theorem 4.1 (Characterization of independence).
(4.4) 
An empirical estimator for is given by
(4.5) 
Note that this might look slightly different to the estimator defined in [14, Definition 4], but an expansion of the products and a relabelling of the indices yields their representation, see (7.5). Using (4.5) the estimator can be defined for all whereas was required in [14].
Analogous to (3.8) the estimator for is defined by
(4.6) 
with given via (2.9) and the following theorem holds, details of the proof are in Section 7.2.
Theorem 4.2 (Asymptotics of ).
and inherit the distributional properties of and , respectively.
Tests based on
use either a resampling method, a rough gamma approximation or an eigenvalue method (see in particular
[14, Table 1]).5 Empirical properties of and
We will show that the new measures can be more powerful than various other copula based measures. Thereafter we evaluate computation methods for the pvalues based on conservative behaviour and speed. Moreover we will also indicate the limitations of the introduced measures by giving an example where the copula versions of the measures perform worse than the original measures. This section is complemented by several figures and tables in the supplement^{4}^{4}4pages S ff. of this manuscript, these consider mostly the same examples with parameter variations and extend detail.
All simulations are performed on an i76500U CPU Laptop using the statistical computing environment R [17], in particular with the packages copula [12], dHSIC [15] and multivariance [3].
The tests are performed with significance level 0.05 using 1000 samples and, if applicable, with 300 resamples. The power is denoted in percent. For multivariance the standard measures corresponding to the Euclidean distance were used, and for dHSIC the kernel defined via with was used (for different values of see Table S.4). Note that both, multivariance and dHSIC, allow many other (possibly parametrized) variants. These would certainly allow to improve the performance for particular examples, but these would also require prior to testing some knowledge of the type of dependence. The purpose of the examples in this section is to show that the copula based measures can be competitive. The task to find optimal measure selection procedures will be part of future research. Moreover, for both measures exist various pvalue derivation methods which will be compared in Figure 1. If not stated otherwise, we use a Monte Carlo distribution based on 100000 samples as described in Remark 2.7 to determine the pvalues.
For a comparison with other copula based measures we consider a set of examples discussed in [8] which provide (using their numbers) a direct comparison to eight of their dependence measures, including those introduced in [9]
. Hereto samples of dependent uniformly distributed random variables are obtained using the following copulas: Clayton copula, Student copula with 1 and 3 degrees of freedom, Normal copula, Frank copula and Gumbel copula. See Figure
S.1 for a visualization of the induced dependencies. Each copula is parametrized such that the pairwise Kendall’s tau is equal to 0.1. The dependent samples were transformed to the following types of marginal distributions: Poisson with mean 1 and 20 (P1 and P20), rounded Pareto (RP) with survival function for (discrete, with infinite expectation), Cauchy (CA) (continuous, with no expectation), Student with 3 degrees of freedom altered with an atom at 0 of mass 0.05 (SA) (mixture, with infinite variance).The power comparison in Table 1 shows that in certain cases the tests based on and are more powerful than those based on measures considered in [8] (’min’ and ’max’ denote the minimal and maximal power of the tests considered therein (without the measure ’R’), see Table S.1 for details). Moreover, in all cases the new measures provide tests which are more powerful than the minimum of the competing measures. performs particularly well for the Student copula (with the exception of rounded Pareto marginals), handles particularly well the cases of normal, Clayton and Frank coupla (in all cases except P1 it is at least close to the ’max’). It is interesting to note that this preference differs from the preferences of the measures and which (as stated in the introduction) correspond structurally to dHSIC and multivariance, respectively.
copula  type  min  max  

normal  CA  72.9  83.2  67.8  82.6 
P1  53.1  63.1  49.4  82.5  
P20  72.6  82.7  66.2  82.4  
RP  71.5  81.1  66.8  83.6  
SA  72.9  83.0  68.1  82.5  
t1  CA  100  83.1  78.6  99.2 
P1  90.2  65.3  50.3  93.4  
P20  100  82.7  78.5  99.2  
RP  100  81.0  80.2  98.6  
SA  100  82.8  78.7  99.2  
t3  CA  96.5  82.6  66.7  93.4 
P1  71.0  67.0  47.9  81.8  
P20  96.3  82.3  65.7  92.9  
RP  94.1  81.9  64.5  91.9  
SA  96.6  82.8  66.6  93.6 
copula  type  min  max  

clayton  CA  79.7  85.5  67.9  83.3 
P1  48.2  53.4  42.3  69.5  
P20  77.7  84.7  66.8  83.9  
RP  75.2  81.8  65.0  81.4  
SA  79.7  85.3  67.5  83.5  
frank  CA  81.8  85.7  68.6  85.8 
P1  65.3  65.6  54.1  79.9  
P20  82.3  84.9  68.2  86.1  
RP  80.8  84.9  67.8  85.7  
SA  81.8  85.9  68.4  85.7  
gumbel  CA  85.9  81.5  60.0  88.4 
P1  73.9  65.2  48.5  84.4  
P20  85.4  80.8  60.4  88.0  
RP  85.8  80.6  60.4  87.9  
SA  85.8  81.6  59.4  88.3 
If one compares tests based on the new measures with those using their base measures and , it also turns out that sometimes the base measures and sometimes the new measures yield more powerful tests (see e.g. Table S.1). Moreover, the behaviour also varies with the sample size and the dimension (see Tables S.2 and S.3).
Recall that using the empirical distribution of the test statistic with Monte Carlo samples of the distribution (i.e., samples with independent components) provides (almost) exact pvalues. Furthermore, note that if the marginals are from arbitrary distributions the corresponding finite sample distribution under does not coincide with the distribution based on uniformly distributed marginals. Nevertheless the latter approximation performs reasonably well for all marginals of Table 1, see Figure S.2. This method becomes in the setting of multiple tests very efficient, cf. Remark 2.7. But if the marginal distributions are multidimensional or if one is not in the setting of multiple tests (with fixed and ) this Monte Carlo approach becomes very slow or inapplicable. Therefore we will now look at other pvalue derivation methods. In particular, we illustrate their (non)conservative behaviour and speed in Figure 1. Here we only consider uniform marginals. The corresponding exact Monte Carlo pvalues are used as benchmark and plotted against the pvalues obtained via the various methods which are available for multivariance and dHSIC. Hence Figure 1 allows to visually assess the empirical size for various significance levels at once. For multivariance the use of the method described in Remark 3.3 (called ’pearson_uniform’ in the figure) is the fastest and Person’s approximation is the sharpest. For dHSIC the eigenvalue method turns out to be a good choice (in its current implementation it is about 10 times slower than ’pearson_uniform’ and slightly conservative). The gamma approximation, which was included in [14] as a fast unproven alternative, seems in this setting not reliable (it shows very liberal behaviour).
In the examples of Table 1 all variables are pairwise dependent. An example of pairwise independent but dependent random variables is constructed as follows (also known as Bernstein’s coins, cf. [5, Section 5] and [2, Example 9.2]): Let and
be independent Bernoulli distributed random variables and set
, which models the event that both ’coins’ show the same side. Then all three variables are Bernoulli distributed and feature the dependence structure shown in Figure 2 (see [2] for further details on the visualization of higher order dependence). For these random variables the detection power of tests based on and and of their classical counter parts (i.e., without the distributional transformation) are shown in the central plot of Figure 2. Here clearly the copula versions perform worse. If we perturb theby independent normally distributed random variables with variance
and mean the measures and perform similarly, but for a difference is still visible. Note the comparison of and might be considered unfair, since it depends strongly on a bandwidth parameter which was fixed here (as stated at the beginning of this section; for cases with variable bandwidth see Table S.4).6 Summary and outlook
A general scheme to remove the dependence on marginal distributions from dependence measures was discussed, with a focus on the distributional properties of the corresponding estimators. The scheme was then explicitly applied to dHSIC and distance multivariance, yielding new measures and corresponding independence tests which are competitive to other copula based measures.
The current work provides an essential basis for future research in several directions, e.g.:

Quantification of dependence, e.g. how to interpret the values of the multicorrelations introduced in [2] for the corresponding copula versions. This enables marginal free comparisons directly based on the values of the dependence measures. Hence selection procedures, e.g. as those suggested in [10], become valid.

Optimization of the introduced measures, e.g. by adaptive selection of the underlying . Hereto note that [5, Section 5.2] provides an example of dependent random variables with uniform marginals where the performance of standard (using corresponding to the Euclidean distances) multivariance improved when the underlying measure was changed (within the framework of distance multivariance).

A detailed comparison of dHSIC and distance multivariance. A clear rule of thumb indicating the preference of one of the measures for a given situation is still missing. The alternating optimum in Tables 1, S.2, S.3 and S.4 indicates that a simple answer is not to be expected, see in this context also the discussion in [2, Section 3.3].
7 Technical details
7.1 Proof of Theorem 3.2 (Asymptotics of )
We use the notation of [2, Section 8.3]:
(7.1) 
with
(7.2) 
and given via (3.2). The are uniformly continuous in the sense of (2.11) since the are uniformly continuous on . Hence by Theorem 2.4 the empirical copula distance multivariance inherits the strong consistency of
We write if the in (7.2) are replaced by and the notation is used for the case were are replaced by .
For the scaled version we will show
(7.3) 
then by Slutsky’s Theorem () the statement of the theorem follows. For (7.3) note that by the Markov inequality it is sufficient to show that the second moment of the left hand side converges to 0.
The second moment of the finite sample version of distance multivariance has been analysed in detail in [1]. It is composed of various terms with the coefficients scrupulously collected in [1, Table 1], this table also indicates the (overall) behaviour of the terms for . Based on this we get the following limit (dropping vanishing terms; using the symmetry and the identical distribution of the summands in the particular sums):
(7.4) 
Finally, consider the derived sum of two limits: in the first limit each expectation converges to and in the second limit each expectation converges to . Therefore both limits become 0.
7.2 Proof of Theorem 4.2 (Asymptotics of )
Expanding the product in the definition of , (4.5), yields
(7.5) 
The are uniformly continuous on and therefore the whole function is uniformly continuous in the sense of (2.11). Hence by Theorem 2.4 the estimator inherits the consistency of
For the scaled version it is sufficient (cf. (7.3)) to show
(7.6) 
Analogously to (7.3) this convergence could be shown using the Markov inequality. But the method of proof used for distance multivariance does not transfer directly, since in the case of individual sums (similar to those appearing in (7.4)) diverge in the limit. Only in a joint analysis these diverging terms cancel explicitly. The remaining terms converge and cancel in the limit as in (7.4). We skip the details here, but provide a sketch of a closely related alternative approach (which is also tedious): In [14] the variance of was calculated. An analogous formula can also be derived for the covariance of the two estimators in (7.6). Hereto the in [14, Proposition 5] have to be replaced by
Comments
There are no comments yet.