# Extreme Nonlinear Correlation for Multiple Random Variables and Stochastic Processes with Applications to Additive Models

The maximum correlation of functions of a pair of random variables is an important measure of stochastic dependence. It is known that this maximum nonlinear correlation is identical to the absolute value of the Pearson correlation for a pair of Gaussian random variables or a pair of nested sums of iid square integrable random variables. This paper extends these results to pairwise Gaussian processes and vectors, nested sums of iid random variables, and permutation symmetric functions of sub-groups of iid random variables.

## Authors

• 16 publications
• 30 publications
12/08/2018

### Generalization of the pairwise stochastic precedence order to the sequence of random variables

We discuss a new stochastic ordering for the sequence of independent ran...
09/28/2018

### A Unified Approach to Construct Correlation Coefficient Between Random Variables

Measuring the correlation (association) between two random variables is ...
04/01/2022

### Copula-based statistical dependence visualizations

A frequent task in exploratory data analysis consists in examining pairw...
06/15/2016

### Network Maximal Correlation

We introduce Network Maximal Correlation (NMC) as a multivariate measure...
04/23/2014

### Probabilistic graphs using coupled random variables

Neural network design has utilized flexible nonlinear processes which ca...
01/26/2019

### On strict sub-Gaussianity, optimal proxy variance and symmetry for bounded random variables

We investigate the sub-Gaussian property for almost surely bounded rando...
12/09/2020

### Estimation of first-order sensitivity indices based on symmetric reflected Vietoris-Rips complexes areas

In this paper we estimate the first-order sensitivity index of random va...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The maximum correlation of functions of a pair of random variables is an important measure of their stochastic dependence. Formally, given random variables and , the maximum correlation is defined as

 (1)

where and are real functions. If and are bivariate normal, it was established in Lancaster (1957) and Yu (2008) that

 R(X1,X2)=|ρ(X1,X2)| (2)

where denotes the Pearson correlation between and . Dembo, Kagan and Shepp (2001) showed that the equality (2) holds with , , if and are respectively nested sums of and

independent and identically distributed (iid) random variables with finite second moment. In a follow-up work,

Bryc et al. (2005) proved for the nested sums without the second moment condition.

The current paper extends the above results to more than two random variables and Gaussian processes. Let and denote the largest and smallest eigenvalues of matrices or linear operators, and the off-diagonal covariance matrix of random variables with elements . As

 |ρ(X1,X2)|=λmax(\rm Corr≠(X1,X2))=−λmin(\rm Corr≠(X1,X2)),

a natural extension of the maximum nonlinear correlation to the multivariate setting is the extreme eigenvalue of the off-diagonal correlation matrix of marginal function transformations of ,

 ρNLmax=ρNLmax(X1,…,Xp)=supf1,…,fpλmax(\rm Corr≠(f(X1),…,fp(Xp))), (3)

where the supreme is taken over all deterministic with , and similarly

 ρNLmin=ρNLmin(X1,…,Xp)=inff1,…,fpλmin(\rm Corr≠(f(X1),…,fp(Xp))). (4)

We note that for , is no longer determined by , so that both quantities are needed to capture the extreme nonlinear correlation. Moreover, this extreme multivariate nonlinear correlation leads to the following further extension of the concept to stochastic processes: For on an index set equipped with a measure ,

 ρNLmax = ρNLmax(XT,ν) = supfT∈FTsup∥h∥L2(ν)=1∫t∈T∫s∈Tρ(fs(Xs),ft(Xt))I{s≠t}h(s)h(t)ν(ds)ν(dt),

where is the indicator function for and is the class of all deterministic satisfying proper measurability and integrability conditions; Correspondingly,

 ρNLmin = ρNLmin(XT,ν) = inffT∈FTinf∥h∥L2(ν)=1∫t∈T∫s∈Tρ(fs(Xs),ft(Xt))I{s≠t}h(s)h(t)ν(ds)ν(dt).

Clearly, (3) and (4) are respectively special cases of (1) and (1) with and being the counting measure.

The main assertion of this paper is that in a number of settings, the extreme nonlinear correlation is identical to its linear counterpart:

 ρNLmax=ρLmax  and  ρNLmin=ρLmin, (7)

where and are defined by restricting the functions in (3) and (4) and in (1) and (1) to be the identity ; e.g. in the more general stochastic process setting,

 ρLmax=ρLmax(XT,ν)=sup∥h∥L2(ν)=1∫t∈T∫s∈Tρ(Xs,Xt)I{s≠t}h(s)h(t)ν(ds)ν(dt), (8)

and

 ρLmin=ρLmin(XT,ν)=inf∥h∥L2(ν)=1∫t∈T∫s∈Tρ(Xs,Xt)I{s≠t}h(s)h(t)ν(ds)ν(dt). (9)

Thus, (7) asserts that the extreme nonlinear correlations match the boundary points of the spectrum of the off-diagonal correlation operator.

We will begin by proving (7) for Gaussian processes on an arbitrary index set equipped with a -finite measure . Our analysis bears some resemblance to that of Lancaster (1957) through the use of the Hermite polynomial expansion, but the general functional nature of our problem requires additional elements involving the spectrum boundary of the Schur product of linear operators. In fact, we prove that only a pairwise bivariate Gaussian condition is required for (7) under proper measurability and integrability conditions.

We shall say that random variables are hidden Gaussian if for a Gaussian vector and some deterministic transformations ; are hidden pairwise Gaussian if the Gaussian requirement on is reduced to pairwise Gaussian. The equivalence of the nonlinear and linear extreme correlations (7) for the pairwise Gaussian process implies that for hidden pairwise Gaussian variables

 ρLmin(Z1,…,Zp)≤ρNLmin(X1,…,Xp)≤ρNLmax(X1,…,Xp)≤ρLmax(Z1,…,Zp).

That is to say, if the correlation structure among

is generated from a pairwise Gaussian distribution through marginal transformations (even in a hidden way), then their extreme nonlinear correlation is controlled within the spectrum of the off-diagonal correlation matrix of the underlying Gaussian distribution. When

are jointly Gaussian and the transformations are monotone, this is the Gaussian copula model widely used in financial risk assessment and other areas of applications.

Our interest in the extreme multivariate nonlinear correlation arises from our study of the additive regression model where the response variable

can be written as

 y=p∑j=1fj(Xj)+ϵ.

As an important nonlinear relaxation of the linear regression, this model dramatically mitigates the curse of dimensionality in the more complex multiple nonparametric regression

(Buja et al., 1989; Wood, 2017; Hastie and Tibshirani, 1986). Let denote the semi-norm given by . Our result on the minimum nonlinear correlation has two interesting implications in the analysis of high-dimensional additive models as follows. Firstly, as established in the literature (Meier et al., 2009; Koltchinskii and Yuan, 2010; Raskutti et al., 2012; Suzuki and Sugiyama, 2013; Tan and Zhang, 2017)

, regularized estimation in the additive model typically yields an error bound on the prediction error

under a certain restricted eigenvalue or compatibility condition on the design which would require a strictly positive lower bound for . The characterization of in the current paper will verify that the required theoretical restricted eigenvalue and compatibility conditions hold for a large class of non-trivial distributions. Secondly, when the minimum nonlinear correlation of is bounded away from zero, the squared loss for the estimation of individual can be derived from the prediction error bound via

 p∑i=1∥ˆfi−fi∥2L(0)2(P)≤11+ρNLmin∥∥∥p∑i=1ˆfi−p∑i=1fi∥∥∥2L(0)2(P).

See Section 2.2 for more detailed discussions.

In addition to the extension of Lancaster (1957) to pairwise Gaussian processes and vectors, the current paper directly extends the results of Dembo, Kagan and Shepp (2001) and Bryc et al. (2005) by establishing (7) for nested sums of iid random variables , with for some positive integers , . Moreover, as a natural generalization of the nested sums, we consider groups of the iid variables as random vectors where are sets of positive integers. We extend the first part of (7) by proving that

 maxf1,…,fpρLmax(f1(X1),…,fp(Xp))=ρLmax(SG1,…,SGp) (10)

where for any deterministic function satisfying and the maximum is taken over all deterministic functions symmetric in the permutation of its arguments. Throughout the paper, such are called permutation symmetric functions or simply symmetric functions. We also establish the corresponding lower bound

 minf1,…,fpρLmin(f1(X1),…,fp(Xp))=ρLmin(SG1,…,SGp) (11)

under a mild condition, including the case where .

Paper Organization. The rest of the paper is organized as follows. In Section 2, we study the extreme nonlinear correlation for pairwise Gaussian random processes or vectors and discuss the implications to additive models; In Section 3, we study the extreme multivariate nonlinear correlation of nested sums and also the more general symmetric functions.

## 2 Pairwise Gaussian Processes

In Section 2.1, we characterize the extreme nonlinear correlations (1) and (1) for pairwise Gaussian processes, and discuss the implications of the result in the multivariate setting, including Gaussian copulas and the more general hidden pairwise Gaussian distributions. In Section 2.2, we discuss applications of the result in additive models, including justification of theoretical restricted eigenvalue and compatibility conditions and derivation of convergence rates for the estimation of individual component functions from prediction error bounds.

### 2.1 Extreme Nonlinear Correlation for Pairwise Gaussian Processes

To start with, we shall explicitly specify the measurability and integrability conditions for the definition of the extreme linear and nonlinear correlations in (8), (9), (1) and (1).

Assumption A: (i) The measure is -finite on .
(ii) The process is standardized to and , the kernel is measurable as a function of in the product space , and the extreme linear correlations in (8) and (9) are both finite.

We note that there is no loss of generality to assume that is standardized as (8) and (9) involve only the correlation between and . We also note that the extreme linear correlations in (8) and (9) are both finite if and only if the linear operator is bounded in .

Assumption B: In (1) and (1), is the class of all function families with , and such that are measurable functions of on for all integer , the kernel is measurable as a function of on , and the linear operator is bounded.

We note that in the discrete case where , Assumption A always holds when and and Assumption B always holds when contains all satisfying and , .

We first establish some equivalent expressions to (1) and (1) in the following lemma.

###### Lemma 1.

Let and be as in (1) and (1) with the function class specified in Assumption B. Then,

 ρNLmax=supfT∈FT∫t∈T∫s∈TE[fs(Xs),ft(Xt)]I{s≠t}ν(ds)ν(dt)∫E[f2t(Xt)]ν(dt), (12)

and

 ρNLmin=inffT∈FT∫t∈T∫s∈TE[fs(Xs),ft(Xt)]I{s≠t}ν(ds)ν(dt)∫E[f2t(Xt)]ν(dt). (13)

A proof of Lemma 1 can be found in the Appendix. The more explicit expressions established in the lemma would facilitate the Hermite expansion of the covariance in our analysis. Another ingredient of our analysis, stated in the following lemma, concerns the extreme eigenvalues of the Schur product of the off-diagonal correlation kernel.

###### Lemma 2.

Let and be as in (8) and (9) respectively. Under Assumption A,

 ρLmin≤∫t∈T∫s∈TKm(s,t)h(s)h(t)ν(ds)ν(dt)≤ρLmax. (14)

for any integer and function with .

The above lemma establishes that the spectrum of the operator given by the Schur power kernel is controlled inside that of . The proof of the Lemma, given in the Appendix, utilize an interesting construction of the Schur power kernel with iid copies of . Such a proof technique is of independent interest.

We are now ready to state the equivalence between the extreme nonlinear correlation and the extreme linear correlation for pairwise Gaussian processes.

###### Theorem 1.

Let be a pairwise Gaussian process in the sense that are bivariate Gaussian vectors for all pairs . Under Assumptions A and B,

 ρNLmax=ρLmax  and  ρNLmin=ρLmin,

where and are the extreme nonlinear correlations in (1) and (1) respectively, and and are their linear counterpart in (8) and (9) respectively.

###### Proof.

As the normalized Hermite polynomials

 Hm(x)=(m!)−1/2(−1)mex2/2(d/dx)me−x2/2

form a orthonormal system with and for , by Assumptions A and B we may write in the sense of convergence. Let be as in Assumption A. As is bivariate normal with , as in Lancaster (1957). It follows that and that by Lemma 2

 ∫s∈T∫t∈TE[fs(Xs),ft(Xt)]ν(ds)ν(dt) = ∫s∈T∫t∈T{∞∑m=1Km(s,t)am(s)am(t)}ν(ds)ν(dt) ≤ ρLmax∞∑m=1∫a2m(t)ν(dt) = ρLmax∫E[f2t(Xt)]ν(dt).

Moreover, as the exchange of summation and integration is allowed as the above,

 ∫s∈T∫t∈TE[fs(Xs),ft(Xt)]ν(ds)ν(dt) = ∞∑m=1∫s∈T∫t∈T{Km(s,t)am(s)am(t)}ν(ds)ν(dt) ≥ ρLmin∞∑m=1∫a2m(t)ν(dt) = ρLmin∫E[f2t(Xt)]ν(dt).

The proof is complete as inequalities in the other direction are trivial. ∎

We state in the rest of the subsection some corollaries as immediate consequences of Theorem 1 and Lemma 1.

###### Corollary 1.

Let be a Gaussian process with Lebesgue measurable off-diagonal correlation as a function in . Let denote the linear operator . Then, for all bounded continuous functions ,

 λmin(K)∫10\rm Var(f(Xt,t))dt≤\rm Var(∫10f(Xt,t)dt)≤λmax(K)∫10\rm Var(f(Xt,t))dt.

Equivalently, the extreme nonlinear correlations in (1) and (1) with and the Lebesgue measure are given by

 ρNLmax(X[0,1])=λmax(K)andρNLmin(X[0,1])=λmin(K).
###### Corollary 2.

Let be pairwise Gaussian random variables with a correlation matrix . Then, for all functions satisfying and ,

 λmin(Σ)⋅p∑j=1Ef2j(Xj)≤E(p∑j=1fj(Xj))2≤λmax(Σ)⋅p∑j=1Ef2j(Xj). (15)

Equivalently, the extreme nonlinear correlations in (3) and (4) are given by

 ρNLmax(X1,…,Xp)=λmax(Σ)−1andρNLmin(X1,…,Xp)=λmin(Σ)−1.

Finally, we state in the following corollary the implication of Theorem 1 on Gaussian copula and other hidden pairwise Gaussian variables: the extreme (nonlinear) correlations of such random variables are controlled by the spectrum limits of the off-diagonal covariance matrix of the underlying Gaussian distribution.

###### Corollary 3.

Suppose follows a hidden Gaussian distribution in the sense of for a Gaussian vector and some deterministic functions with . Then,

 λmin(Σz)−1≤ρNLmin(X1,…,Xp)≤ρNLmax(X1,…,Xp)≤λmax(Σz)−1.

Moreover, the Gaussian assumption on can be weakened to pairwise Gaussian.

### 2.2 Implications in Additive Models

In high-dimensional additive regression models, the restricted eigenvalue and compatibility conditions are crucial elements of the theory of regularized estimation. These conditions are closely related to the extreme nonlinear correlation as we discuss here.

In the additive regression model, the relationship between the response variable and design variables is given by

 Y=p∑j=1fj(Xj)+ε,

where is the noise variable independent of . Let be the unknown index set of real signals and and be positive constants, the theoretical restricted eigenvalue and compatibility conditions can be defined as

 inf⎧⎪ ⎪⎨⎪ ⎪⎩|I|1−q/2∥∥∑pj=1fj(Xj)∥∥2L(0)2(P)∑j∈J∥∥fj(Xj)∥∥qL(0)2(P):∑j∈I∥∥fj(Xj)∥∥L(0)2(P)∑j∈Ic∥∥fj(Xj)∥∥L(0)2(P)>ξ0⎫⎪ ⎪⎬⎪ ⎪⎭≥κ0 (16)

with the convention , with the left-hand side being the restricted eigenvalue for and and compatibility coefficient for and . The above definition generalizes both the restricted eigenvalue condition (Bickel et al., 2009) and the compatibility condition (van de Geer and Bühlmann, 2009) introduced in the high-dimensional regression.

Regarding the analysis of high-dimensional additive models, the condition (16) with has been used in Koltchinskii and Yuan (2010); Suzuki and Sugiyama (2013) as a key assumption. The condition (16) with and has been used in Tan and Zhang (2017) to establish the prediction accuracy of the high-dimensional sparse additive models. Despite the importance of (16), it has been typically imposed as a condition but without verifying its validity other than in some very special cases such as the class of densities on uniformly bounded away from 0 and . The result of the current paper on extreme multivariate nonlinear correlation will shed light on the restricted eigenvalue or theoretical compatibility condition for additive models, in the sense that the condition (16) is satisfied with being the minimum eigenvalue of the correlation matrix. Such a result is stated in the following corollary, as a consequence of combining Corollary 2 and Lemma 1.

###### Corollary 4.

Suppose follows a hidden Gaussian distribution with for a pairwise Gaussian vector with and some deterministic functions with . Then, the condition (16) holds with In particular, if is a positive constant, then the theoretical restricted value condition and compatibility condition hold.

The above corollary implies that the condition (16) holds for the Gaussian copula model, where the variable for is generated by the underlying pairwise Gaussian random variables and

are the corresponding cumulative distribution function. To the best of the authors’ knowledge, this is a new connection of the theoretical restricted eigenvalue and compatibility conditions to the minimum eigenvalue of the correlation matrix.

In addition to verifying the important condition (16), we can also apply the minimum multivariate nonlinear correlation to connect the rate of convergence for estimating the individual components to the prediction error established in the literature (Meier et al., 2009; Koltchinskii and Yuan, 2010; Raskutti et al., 2012; Suzuki and Sugiyama, 2013; Tan and Zhang, 2017).

###### Corollary 5.

Under the same assumption as Corollary 5,

 p∑i=1∥ˆfi−fi∥2L(0)2(P)≤1λmin(Σz)∥∥∥p∑i=1ˆfi−p∑i=1fi∥∥∥2L(0)2(P). (17)

## 3 Extreme Nonlinear Correlation for Symmetric Functions of iid Random Variables

In this section, we move beyond the pairwise Gaussianality and consider the extreme nonlinear correlation for symmetric functions of iid random variables. We first consider multiple nested sums of iid random variables to directly generalize the results for a pair of nested sums established in Dembo, Kagan and Shepp (2001) and Bryc et al. (2005). In Section 3.2, we consider class of symmetric functions defined on groups of iid random variables and establish the extreme nonlinear correlation in the much broader setting.

### 3.1 Extreme Nonlinear Correlation for Partial Sums

In this section, we consider the extreme nonlinear correlation for multiple nested sums of iid random variables. Specifically, given positive integers and iid non-degenerate random variables , we consider

 Xj=Smj=mj∑i=1Yiforj=1,…,p. (18)

Here, non-degenerate means that the distribution of the random variable is not concentrated at a point. In the case of , Dembo, Kagan and Shepp (2001) proved that the maximum correlation of and is equal to if has finite second moment, and Bryc et al. (2005)

proved the same result even without assuming the finite second order moment by investigating the characteristic functions of sums of

. The following theorem extends their results from to general . Further extensions to general symmetric functions of arbitrary groups of are given in the next subsection.

###### Theorem 2.

Let be iid non-degenerate random variables and be nested sums of with sample sizes as defined in (18). Then,

 ρNLmax(X1,…,Xp)=λmax(R),ρNLmin(X1,…,Xp)=λmin(R), (19)

where is the matrix with elements . If has a finite second moment, then is the off-diagonal correlation matrix of the nested sums , , so that (7) holds with ,

 ρNLmax(X1,…,Xp)=ρLmax(X1,…,Xp),ρNLmin(X1,…,Xp)=ρLmin(X1,…,Xp).
###### Proof.

As , , are symmetric functions of nested variable groups with and , it follows from Theorem 3 in the next subsection that

 ρNLmax(X1,…,Xp)≤λmax(R),ρNLmin(X1,…,Xp)≥λmin(R).

It remains to prove that and are attainable by functions . This would be simple under the second moment condition on as we may simply set . In the case of , we prove that is in the closure of the off-diagonal correlation matrices generated by . This will be done below by proving

 limt→0+ρ(sin(tXj−mjct),sin(tXj−mkct))=Rj,k,1≤j

where is the solution of

 E[sin(tY−ct)]=0, or equivalently   E[sin(tY)]E[cos(tY)]=tan(ct).

Note that in our proof below, we need to take the limit in (20) along a subsequence of to avoid if the situation arises. This would always be feasible as can be achieved only in a countable set of .

As and , it suffices to consider small satisfying . Let . As , we have

 ∣∣E[sin(Y′)cos(Y′)]∣∣ = ∣∣E[sin(Y′)(1−cos(Y′))]∣∣ (21) ≤ E[sin2(Y′)]+√E[sin2(Y′)]P{|Y|>1/t}. (22)

Let and . We shall prove that for

 limt→0+ρ(sin(S′a:m),sin(S′b:n))=(m−b+1)(m−a+1)1/2(n−b+1)1/2. (23)

This implies (20) with , and , but the more general and would provide extension to sums of arbitrary subgroups of later in Corollary 6.

Let . As . We write

 fa,m=m∑u=afa,m,u   where   fa,m,u=(u−1∏i=acos(Y′i))sin(Y′u)cos(S′(u+1):m).

Let . As , we have and for or for . For ,

 fa,m,ufb,n,v = (u−1∏i=acos(Y′i))sin(Y′u)cos(S′(u+1):m)(v−1∏i=bcos(Y′i))sin(Y′v)