 # Symmetric Rank Covariances: a Generalised Framework for Nonparametric Measures of Dependence

The need to test whether two random vectors are independent has spawned a large number of competing measures of dependence. We are interested in nonparametric measures that are invariant under strictly increasing transformations, such as Kendall's tau, Hoeffding's D, and the more recently discovered Bergsma--Dassios sign covariance. Each of these measures exhibits symmetries that are not readily apparent from their definitions. Making these symmetries explicit, we define a new class of multivariate nonparametric measures of dependence that we refer to as Symmetric Rank Covariances. This new class generalises all of the above measures and leads naturally to multivariate extensions of the Bergsma--Dassios sign covariance. Symmetric Rank Covariances may be estimated unbiasedly using U-statistics for which we prove results on computational efficiency and large-sample behavior. The algorithms we develop for their computation include, to the best of our knowledge, the first efficient algorithms for the well-known Hoeffding's D statistic in the multivariate setting.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

Many applications, from gene expression analysis to feature selection in machine learning tasks, require quantifying the dependence between collections of random variables. Letting

and be random vectors, we are interested in measures of dependence which exhibit the following three properties,

1. I-consistency: if are independent then ,

2. D-consistency: if are dependent then ,

3. Monotonic invariance: if are strictly increasing functions then . For simpler language, we also refer to this property as being nonparametric.

If

is I-consistent then tests of independence can be based on the null hypothesis

. If is additionally D-consistent then tests based on consistent estimators of are guarenteed to asymptotically reject independence when it fails to hold. When is both I- and D-consistent we will simply call it consistent. On the other hand, monotonic invariance is the intuitive requirement that the level of dependence between two random vectors is invariant to monotonic transformations of any coordinate. Unfortunately, many popular measures of dependence fail to satisfy some subset of these properties. For instance, Kendall’s (Kendall, 1938) and Spearman’s (Spearman, 1904) are nonparametric and I-consistent but not D-consistent while the distance correlation (Székely et al., 2007) is consistent but not nonparametric in the above sense.

For bivariate observations, Hoeffding (1948) introduced a nonparametric dependence measure that is consistent for a large class of continuous distributions. Let be a random vector taking values in , with joint and marginal distribution functions , , and . Then the statistic, now called Hoeffding’s , is defined as

 (1.1) D=∫R2(FXY(x,y)−FX(x)FY(y))2 %dFXY(x,y).

Bergsma and Dassios (2014) introduced a new bivariate dependence measure that is nonparametric and improves upon Hoeffding’s by guaranteeing consistency for all bivariate mixtures of continuous and discrete distributions. As its name suggests, generalises Kendall’s ; where counts concordant and discordant pairs of points, counts concordant and discordant quadruples of points. The proof of consistency of is considerably more involved than that for .

Surprisingly both and exhibit a number of identical symmetries that are obfuscated by their usual definitions. Indeed, as will be made precise, and can be represented as the covariance between signed sums of indicator functions acted on by the subgroup

 H=⟨(1 4), (2 3)⟩

of the symmetric group on four elements. We generalise the above observation to define a new class of dependence measures called Symmetric Rank Covariances. All such measures are I-consistent, nonparametric, and include , , , and as special cases. Moreover, our new class of measures includes natural multivariate extensions of which themselves inspire new notions of concordance and discordance in higher dimensions, see Figure 1. While Symmetric Rank Covariances need not always be D-consistent we identify a sub-collection of measures that are. These consistent measures can be interpreted as testing independence by applying, possibly infinitely many, independence tests to discretizations of . Symmetric Rank Covariances can be readily estimated using U-statistics and we show that the use of efficient data structures for orthogonal range queries can result in substantial savings. Moreover, we show that under independence many of the resulting U-statistics are degenerate of order 2, thus having non-Gaussian limiting distributions. For space, most proofs have been moved to Appendix B.

## 2. Preliminaries

### 2.1. Manipulating Random and Fixed Vectors

We begin by establishing conventions and notation used throughout the paper. Let

 (Z1,…,Zr+s)=Z=(X,Y)=((X1,…,Xr),(Y1,…,Ys))

be a random vector taking values in , and let for be a sequence of independent and identically distributed copies of . When and are independent we write , otherwise we write . We let and denote the cumulative distribution functions for , , and , respectively.

We will require succinct notation to describe (permuted) tuples of vectors. For any , define . Let . Then for any , let

 wi1,…,im =w(i1,…,im)=(wi1,…,wim)and (wi1,…,im,wj1,…,jk) =(wi1,…,wim,wj1,…,wjk).

If appears in the superscript of a vector it should be interpreted as an ordered vector, that is, we let .

Let be the symmetric group. For and , let

 σw[n]=(wσ−1(1),…,wσ−1(n)).

This defines a (left) group action of on that we will encounter often. As our convention is that is a tuple when in a superscript, we have that for all . We stress that in general.

### 2.2. Hoeffding’s D

The bivariate setting from (1.1) immediately extends to a multivariate version of Hoeffding’s for the random vectors and by defining

 D(X,Y)=∫Rr×Rs(FXY(x,y)−FX(x)FY(y))2 dFXY(x,y).

Since if and only if , it is clear that implies . The converse need not always be true as the next example shows.

###### Example 2.1.

Let be a bivariate distribution with . Then clearly and are not independent but we have that

 D(X,Y)=12(FXY(1,0)−FX(1)FY(0))2+12(FXY(0,1)−FX(0)FY(1))2=12(1/2−1⋅1/2)2+12(1/2−1/2⋅1)2=0.

Thus, is I-consistent but not D-consistent in general. It is, however, consistent for a large class of continuous distributions.

###### Theorem 2.2 (Multivariate version of Theorem 3.1 in Hoeffding, 1948).

Suppose and have a continuous joint density and continuous marginal densities and . Then if and only if .

###### Proof.

The bivariate case is treated in Theorem 3.1 in Hoeffding (1948). The proof of the multivariate case is analogous. ∎

Example 2.1 highlights that the failure of to detect all dependence structures can be attributed to the measure of integration . This suggests the following modification of which we call Hoeffding’s ,

 R(X,Y) =∫r+sR(FXY(x,y)−FX(x)FY(y))2r∏i=1 dFXi(xi)s∏j=1 dFYj(yj).

We suspect that it is well known that is consistent but we could not find a compelling reference of this fact. For completeness we include a proof in the appendices.

###### Theorem 2.3.

Let be drawn from a multivariate distribution on as usual. Then and if and only if .

### 2.3. Bergsma–Dassios Sign-Covariance τ∗

Bergsma and Dassios (2014) defined only for bivariate distributions so let for this section. While has a natural definition in terms of concordant and discordant quadruples of points, we will present an alternative definition that will be more useful for our purposes. First for any let where if and only if . Then, as is shown by Bergsma and Dassios (2014), we have that

 (2.1) τ∗(X,Y) =E[(Iτ∗(X)+Iτ∗(X4,3,2,1)−Iτ∗(X1,3,2,4)−Iτ∗(X4,2,3,1)) ⋅(Iτ∗(Y)+Iτ∗(Y4,3,2,1)−Iτ∗(Y1,3,2,4)−Iτ∗(Y4,2,3,1))].

While Bergsma and Dassios (2014) conjecture that is consistent for all bivariate distributions, the proof of this statement remains elusive. The current understanding of the consistency of is summarised by the following theorem.

###### Theorem 2.4 (Theorem 1 of Bergsma and Dassios, 2014).

Suppose are drawn from a bivariate continuous distribution, discrete distribution, or a mixture of a continuous and discrete distribution. Then and if and only if .

Theorem 2.4 does not apply to any singular distributions; for instance, we are not guaranteed that when are generated uniformly on the unit circle in .

## 3. Symmetric Rank Covariance

### 3.1. Definition and Examples

We now introduce a new class of nonparametric dependence measures that depend on and only through their joint ranks.

###### Definition 3.1 (Matrix of Joint Ranks).

Let . Then the joint rank matrix of is the -valued matrix with entry

 R(w[m])ij=1+m∑k=11[wki

that is, is the rank of among for .

###### Definition 3.2 (Rank Indicator Function).

A rank indicator function of order and dimension is a function such that for all . In other words, depends on its arguments only through their joint ranks.

###### Definition 3.3 (Symmetric Rank Covariance).

Let and be rank indicator functions that have equal order and are of dimensions and , respectively. Let be a subgroup of the symmetric group

with an equal number of even and odd permutations. Define

 (3.1) μIX,IY,H(X,Y)=E[(∑σ∈H% sign(σ) IX(Xσ[m])) (∑σ∈H% sign(σ) IY(Yσ[m]))].

Then a measure of dependence is a Symmetric Rank Covariance if there is a scalar and a triple as specified above such that . More generally, is a Summed Symmetric Rank Covariance if it is the sum of several Symmetric Rank Covariances.

Some of the symmetric rank covariances we consider have the two rank indicator functions equal, so . In this case, we also use the abbreviation .

###### Remark 3.4.

Recall from Section 2.3 that for any we write to mean . To simplify the definitions of rank indicator functions we generalise this notation as follows. Let be any binary relation on . Then for and we write to mean for all .

It is easy to show that many existing nonparametric measure of dependence are Symmetric Rank Covariances.

###### Proposition 3.5.

Let and take values in and , respectively. Consider the permutation groups and .

(i) Bivariate case (): Kendall’s , its square , and of Bergsma–Dassios are Symmetric Rank Covariances. Specifically,

 τ=μIτ,Hτ,τ2=μIτ2,Hτ∗,andτ∗=μIτ∗,Hτ∗,

where the one-dimensional rank indicator functions are defined as

 Iτ∗(w)=I[w1,w2

(ii) General case (): Both and are Symmetric Rank Covariances. Specifically,

 D=14μID,r,ID,s,Hτ∗andR=14μIR,r,IR,s,Hτ∗

where for any dimension and , we define

 ID,d(w) =I[w1,w2⪯w5]I[w3,w4⪯/w5], IR,d(w[4+d]) =d∏i=1I[w1i,w2i≤w4+ii

with if and only if for all .

###### Remark 3.6.

The bivariate dependence measure Spearman’s can be written as

 ρ(X,Y) =6 E[I[X1

In light of Lemma 3.8 below, one might expect to be a Symmetric Rank Covariance. However, upon examing which of the above indicators are negated, one quickly notes that the permutations do not respect the sign operation of the permutation group . For instance, and are related through a single transposition and yet the terms have the same sign above. While it seems difficult to prove conclusively that is not a Symmetric Rank Covariance, this suggests that it is not. Somewhat surprisingly however, is a Summed Symmetric Rank Covariance which can be seen by expressing as

 ρ(X,Y)=3 E(b(X)b(Y)+b(X)b(Y[1,3,2])+b(X)b(Y[2,1,3]))

where for all .

### 3.2. General Properties

While many interesting properties of Symmetric Rank Covariances depend on the choice of group and indicators , there are several properties which hold for all such choices.

###### Proposition 3.7.

Let be Symmetric Rank Covariance. Then is nonparametric and I-consistent. If is another Symmetric Rank Covariance, then so is the product .

The property for products in particular justifies squaring Symmetric Rank Covariances, as was done for bivariate rank correlations in Leung and Drton (2016). Later, it will be useful to express a Symmetric Rank Covariances in an equivalent form.

###### Lemma 3.8.

In reference to Equation (3.1), we have

 (3.2) μIX,IY,H(X,Y) =|H| E[IX(X[m]) (∑σ∈Hsign% (σ) IY(Yσ[m]))] (3.3) =|H| E[IY(Y[m]) (∑σ∈Hsign% (σ) IX(Xσ[m]))].

## 4. Generalizing Hoeffding’s D and τ∗

### 4.1. Discretization Perspective

In this section we introduce a collection of Summed Symmetric Rank Covariances that are consistent and can be regarded as natural generalizations of Hoeffding’s and . We begin by showing that and

are accumulations of, possibly infinitely many, independence measures between binarized versions of

and .

###### Definition 4.1.

Let be an -valued random vector, and let . The binarization of at is the random vector

 BZ(w)=(1[Z1>z1],…,1[Zd>zd]).

We call the cutpoint of the binarization.

For any we have . Clearly , ,

are discrete random variables taking values in

, and respectively. The cutpoint divides into orthants corresponding to the states of . We index these orthants by vectors , and define

 p(z)ℓ=P(BZ(z)=ℓ)=P(Zi⋚ℓizi, i∈[r+s])where⋚ℓi ≡{≤if ℓi=0,>if ℓi=1.

Let be the tensor with coordinates . Independence between and can be characterised in terms of the rank of a flattening, or matricization, of . Let be the real matrix with entries

 M(x,y)ℓXℓY=p(z)ℓXℓY

for indices , that are concatenated to form . It then holds that if and only if has rank 1 (Drton et al., 2009, Chapter 3).

The matrix has rank 1 if and only if all of its minors vanish, that is, for all and we have

 0 =M(x,y)ℓXℓYM(x,y)ℓ′Xℓ′Y−M(x,y)ℓ′XℓYM(x,y)ℓXℓ′Y =p(z)ℓXℓYp(z)ℓ′Xℓ′Y−p(z)ℓ′XℓYp(z)ℓXℓ′Y.

One may easily show that if and only if for all . This suggests defining a measure of dependence equal to the integral of the sum of squared minors of the above form. To recover both and , however, we will need to generalise slightly by considering block minors defined below. These block minors correspond to the fact that if and only if for all .

###### Definition 4.2.

Let and be nonempty subsets with and . Then the block minor of along is the value

 (∑ℓX∈LℓY∈Rp(z)ℓXℓY)(∑ℓX∈L′ℓY∈R′p(z)ℓ′Xℓ′Y)−(∑ℓ′X∈L′ℓY∈Rp(z)ℓ′XℓY)(∑ℓX∈LℓY∈R′p(z)ℓXℓ′Y) =∑ℓX∈LℓY∈R∑ℓX∈LℓY∈R(p(z)ℓXℓYp(z)ℓ′Xℓ′Y−p(z)ℓ′XℓYp(z)ℓXℓ′Y).
###### Proposition 4.3.

if and only if all block minors of vanish.

If are singletons the block minor reduces to a usual minor.

We now propose to assess dependence by integrating squared block minors. The integration measures we allow are derived from the variables’ joint distribution but may be taken to be products of marginals as encountered for the measure of dependence

.

###### Definition 4.4 (Integrated Squared Minor).

For any let be the vector of all zeros. A measure is called an integrated squared minor if there exists , , partitioning , and partitioning , such that

 μ(X,Y)=∫Rr+sA(x,y)2 dλXY(x,y)

where is the block minor of along , and the cumulative distribution function can be written as

 λXY(x,y)=∏1≤i≤tFXEiYFi(xEi,yFi).

As the next proposition shows, all integrated square minor measures are Symmetric Rank Covariances.

###### Proposition 4.5.

Let be an Integrated Squared Minor as in Definition 4.4, then is a Symmetric Rank Covariance. In particular, we have where and

 IX(w[4+t]) =∑ℓX∈Lt∏i=11[w1Ei,w2Ei ⪯ w4+iEi]∏j∈Ei1[w3j,w4j ⋚ℓXj w4+ij](w[4+t]∈Rr×(4+t)), IY(w[4+t]) =∑ℓY∈Rt∏i=11[w1Fi,w2Fi ⪯ w4+iFi]∏j∈Fi1[w3j,w4j ⋚ℓYj w4+ij](w[4+t]∈Rs×(4+t)).

Moreover, if and we have that when and when .

Finally we can identify a collection of D-consistent Summed Symmetric Rank Covariances.

###### Proposition 4.6.

Let and be two collections of nonempty sets. Suppose that the sets are pairwise disjoint and form a partition of . For all let

 μjointi(X,Y)=∫Rr+sAi(x,y)2 %dFXY(x,y)

and

 μprodi(X,Y)=∫Rr+sAi(x,y)2r∏i=1dFXi(xi)s∏j=1dFYj(yj)

where is the block minor along . Then the Summed Symmetric Rank Covariance is D-consistent in, at least, all cases that is; similarly is D-consistent in all cases.

### 4.2. Multivariate τ∗

Recall from Proposition 3.5 that . Multivariate extensions of should simultaneously capture the essential characteristics of while permitting enough flexibility to define interesting measures of high-order dependence. As a first step to distilling these essential characteristics, it seems natural that any multivariate extension of uses the same permutation subgroup .

###### Remark 4.7.

There are 30 distinct subgroups of exactly 20 of which have an equal number of even and odd permutations and thus could be used in the definition of a Symmetric Rank Covariance. Given these many possible choices it may seem surprising that appears in the definition of so many existing measures of dependence, namely , , , and . Some intuition for the ubiquity of can be gleaned from the proof of Proposition 3.5 where we show that arises naturally from an expansion of .

It now remains to find an appropriate generalization of . To better characterise we require the following definition.

###### Definition 4.8 (Invariance Group of an Indicator).

Let be a rank indicator function of order and dimension . The permutations such that for all form a group that we refer to as the invariance group of . For any Symmetric Rank Covariance , let be the invariance groups of and respectively. We then call the invariance group of .

We now single out two properties of .

###### Property 4.1.

is a rank indicator function of order .

###### Property 4.2.

The invariance group of is .

This inspires the following definition.

###### Definition 4.9.

We say that an Symmetric Rank Covariance is a extension if and are rank indicators of order with invariance group and .

From the possible extensions we consider two notable candidates.

###### Definition 4.10.

For any let be the rank indicator where for any we have . We then call the multivariate partial and write .

The definition of is inspired by , see Proposition 3.5.

###### Definition 4.11.

For any let be the rank indicator where for any we have . We then call the multivariate joint and write .

Our definition of comes immediately from when replacing the total order with , although this might be the most intuitive multivariate extension of it is easily seen to not be D-consistent as the next example shows. In both of the above definitions, the extensions reduce to being when .

###### Example 4.12.

Let where is even and Bernoulli are independent. Now let , that is, let if is odd and otherwise. Now letting