On infinite covariance expansions

In this paper we provide a probabilistic representation of Lagrange's identity which we use to obtain Papathanasiou-type variance expansions of arbitrary order. Our expansions lead to generalized sequences of weights which depend on an arbitrarily chosen sequence of (non-decreasing) test functions. The expansions hold for arbitrary univariate target distribution under weak assumptions, in particular they hold for continuous and discrete distributions alike. The weights are studied under different sets of assumptions either on the test functions or on the underlying distributions. Many concrete illustrations for standard probability distributions are provided (including Pearson, Ord, Laplace, Rayleigh, Cauchy, and Levy distributions).

• 3 publications
• 19 publications
• 5 publications
01/28/2021

Probabilistic Data with Continuous Distributions

Statistical models of real world data typically involve continuous proba...
03/09/2021

Multivariate tail covariance for generalized skew-elliptical distributions

In this paper, the multivariate tail covariance (MTCov) for generalized ...
07/21/2020

Majorisation as a theory for uncertainty

Majorisation, also called rearrangement inequalities, yields a type of s...
01/14/2018

On Identifying a Massive Number of Distributions

Finding the underlying probability distributions of a set of observed se...
02/10/2020

On upper bounds on expectations of gOSs based on DFR and DFRA distributions

We focus on the problem of establishing the optimal upper bounds on gene...
06/19/2019

First order covariance inequalities via Stein's method

We propose probabilistic representations for inverse Stein operators (i....
03/10/2021

Multicalibrated Partitions for Importance Weights

The ratio between the probability that two distributions R and P give to...

1 Introduction

The starting point of this paper is the famous Gaussian expansion which states that if , then

 Var[g(N)]=∞∑k=1(−1)k+1k!E[g(k)(N)2] (1.1)

for all smooth functions such that all the expectations exist. Expansion (1.1), whose first order term yields an upper variance bound generalizing Chernoff’s famous Gaussian bound from [10], has been obtained in a number of different (and often non equivalent) ways. It is proved in [18] via orthogonality properties of Hermite polynomials, and extensions to multivariate and infinite dimensional settings are given in [19, 20]. Chen uses martingale and stochastic integrals to obtain a general version of (1.1) (also valid on certain manifolds) in [9]. The expansion is contextualized in [25] through properties of the Ornstein-Uhlenbeck operator, and it is also shown in that paper that the semi-group arguments carry through to non-Gaussian target distributions under general assumptions. A very general approach to this line of research can be found in [20]

where similar expansions are obtained by means of an iteration of an interpolation formula for infinitely divisible distributions. The main difference between the univariate standard Gaussian and the general non-Gaussian target is that the explicit weight sequence and simple iterated derivatives appearing in (

1.1) need to be replaced by some well-chosen iterated gradients with weight sequences which can be quite difficult to obtain explicitly (for instance Ledoux’ sequence from [25] is an iteration of the “carré du champ” operator).

The above references are predated by [31] wherein a general version of (1.1) (valid for arbitrary continuous target distributions) is obtained through elementary arguments relying on an iteration of the exact Cauchy-Schwarz equality (via the so-called Mohr and Noll identity from [29]) combined with the Lagrange identity for integrals due to [7]. Papathanasiou’s method of proof is extended in [4] to encompass discrete distributions. Both the continuous and discrete expansions are of the same form as (1.1), although the weight sequence is replaced with a target-specific explicit sequence of weights (see equations (1.4) and (2) below). To set the scene, we borrow notation from [14] which allows to unify the presentation of the results from [31] and [4] and shall be used throughout this paper.

Notation: For a function let for all , with the convention that , with the weak derivative defined Lebesgue almost everywhere. The case is referred to as the continuous case and is referred to as the discrete case. For a real-valued function , in the continuous case denotes its derivative; discrete higher order derivatives are obtained by iterating the forward derivative . We use the rising and falling factorial notation

 f[k](x)=k−1∏j=0f(x+j) and f[k](x)=k−1∏j=0f(x−j), (1.2)

with the convention that .

Expansion (1.1) can then be seen as a particular instance of the following result (see [31, Theorem 1 and Corollary 1] and [4, Theorem 3.1]).

Theorem 1.1 (Papathanasiou’s expansion).

Let

be a random variable with finite

moments. Let be a real-valued function with finite variance with respect to . Then

 Var[g(X)]=n∑k=1(−1)k−1E[(g(k)(X))2Γk(X)]+(−1)nRn (1.3)

where is a non-negative remainder term and depend on the type of distribution, as follows.

1. If

is a real random variable with continuous probability density function (pdf)

, then the weights are

 Γk(t)=(−1)k−1k!(k−1)!p(t)(E[(X−t)k]∫t−∞(x−t)k−1p(x)dx−E[(X−t)k−1]∫t−∞(x−t)kp(x)dx), (1.4)

defined for all such that .

2. If is an integer-valued r.v. with probability mass function (pmf) , then the weights are

 Γk(t) =(−1)k−1k!(k−1)!p(t)(E[(X−t)[k]]∑x

defined for all such that .

It is not hard to show that when , the weight sequence (1.4) simplifies to so that (1.3) indeed contains (1.1). More generally, it is shown in [21] that if belongs to the Integrated Pearson (IP) system of distributions (see Definition 3.6) then the weights take on a particularly agreeable form, namely and (which is constant if is Integrated Pearson); many familiar univariate distributions belong to the IP system, such as the normal, beta, gamma, and Student distributions. Similarly as in the continuous case, it is shown by [4, Corollary 4.1] that if belongs to the cumulative Ord family with parameter defined in Definition 3.10, then the weights in (2) are

. Like its continuous counterpart, the discrete IP system also contains many familiar univariate distributions such as the binomial, Poisson and geometric distributions.

The list of references presented so far is anything but exhaustive and expansions inspired from (1.1) have attracted a lot of attention over the years, e.g. with extensions to matrix inequalities as in [30, 36, 2], to stable distributions [23]

, to Bernoulli random vectors

[6]

; more references shall be provided in the text. Aside from their intrinsic interest, they have many applications and are closely connected to a wide variety of profound mathematical questions. For statistical inference purposes, they can be used in the study of the variance of classes of estimators (see e.g.

[4, section 5]), of copulas ([12]), for problems related to superconcentration ([8] and [35]) or for the study of correlation inequalities [20] and [5]. These expansions can also interpreted as refined log-Sobolev, Poincaré or isoperimetric inequalities, see [33]. The weights appearing in the first order () bounds are crucial quantities in Stein’s method [16, 26]

and their higher order extensions are closely connected to eigenvalues and eigenfunctions of certain differential operators

[9].

In the present paper, we combine the method from [31, 4] with intuition from [22] (and our recent work [14]) to unify and extend the results from Theorem 1.1 to arbitrary targets under very weak assumptions. The result is given in Theorem 2.5 and can be briefly sketched in a simplified form as follows. Fix a sequence either in or and let be such that for all . Starting with some functions , we recursively define the sequence (resp., ) by (resp., ) and (resp., ) for all . Then, for all , it holds that if the expectations below are finite then

 Cov[f(X),g(X)]=n∑k=1(−1)k−1E[Δ−ℓkfk−1(X)Δ−ℓkgk−1(X)Γℓk(h)(X)Δ−ℓh(X)]+(−1)nRℓn(h) (1.6)

where the weight sequences as well as the non-negative remainder term are given explicitly (see Theorem 2.5) and in many cases have a simple form (see Section 3). The expansions from Theorem 1.1 are recovered by setting , and (the identity function) and, in the discrete case, . Far from obscuring the message, expansion (1.6), and its more general form provided in Theorem 2.5, shed new light on the expansion (1.3) and its available extensions by bringing a new interpretation to the weight sequences in terms of explicit iterated integrals and sums. This is the topic of Section 3. Our results also inscribe the topic within a context which is familiar to practitioners of the famous Stein’s method. This last connection nevertheless remains slightly mysterious and will be studied in detail in future contributions.

The paper is organised as follows. In Section 2 we provide the main results in their most abstract form. After setting up the notations (inherited mainly from [14]), Section 2.3 contains the crucial Lagrange identity (Lemma 2.4) and Section 2.2 contains the Papathanassiou-type expansion (Theorem 2.5). In Section 3 we provide illustrations by rewriting the weights appearing in Theorem 2.5 under different sets of assumptions. First, in Section 3.1 we consider a general weighting function ; next, in Section 3.2 we choose certain specific intuitively attractive -functions (namely the identity, the cdf and the score); finally in Section 3.3 we obtain explicit expressions for various illustrative distributions (here in particular the connection with existing literature on the topic is also made). For the sake or readability, all proofs are relegated to an Appendix.

2 Infinite matrix-covariance expansions

We begin this paper by recalling some elements of the setup from our paper [14]. Let and equip it with some -algebra and -finite measure . Let be a random variable on , with probability measure which is absolutely continuous with respect to ; we denote the corresponding probability density, and its support by . As usual, is the collection of all real valued functions such that . Although we could in principle keep the discussion to come very general, in order to make the paper more concrete and readable in the sequel we shall restrict our attention to distributions satisfying the following Assumption.

Assumption A. The measure is either the counting measure on or the Lebesgue measure on . If is the counting measure then there exist such that . If is the Lebesgue measure then there exist such that .

We denote the collection of functions such that exists and is finite -almost surely on . If , this corresponds to all absolutely continuous functions; if the domain is the collection of all functions on . Let . Still following [14] we also define

 aℓ=I[ℓ=1] and bℓ=a−ℓ=I[ℓ=−1] (2.1)

as well as the generalized indicator function

 χℓ(x,y)=I[x≤y−aℓ] (2.2)

which is defined with the obvious strict inequalities also for and , and

 Φℓp(u,x,v)=χℓ(u,x)χ−ℓ(x,v)/p(x) (2.3)

for all (note that for ). The following result is immediate but useful:

Lemma 2.1.

For all , it holds that Moreover,

 χℓ(u,y)χℓ(v,y)=χℓ(max(u,v),y)% and χℓ(x,u)χℓ(x,v)=χℓ(x,min(u,v)). (2.4)

We conclude with another result from [14]; this results motivates the covariance expansion in Theorem 2.5.

Lemma 2.2.

If is such that is integrable on then,

 f(x2)−f(x1)=E[Φℓp(x1,X,x2)Δ−ℓf(X)]. (2.5)

If, furthermore, then

 E[(f(X2)−f(X1))I[X1

2.1 A probabilistic Lagrange inequality

The first ingredient for our results is the following covariance representation (recall that all proofs are in the Appendix).

Lemma 2.3.

Let with support . If are independent copies of then

 Cov[f(X),g(X)] =E[(f(X2)−f(X1))(g(X2)−g(X1))I[X1

for all .

A simple representation such as (2.6) is obviously not new, per se; see e.g. the variance expression in [28, page 122]. In fact, treating the discrete and continuous cases separately, one could also obtain identity (2.6) as a direct application of Lagrange’s identity (a.k.a. the Cauchy-Schwarz inequality with remainder) which reads, in the finite discrete case, as

 (2.8)

Using and for , identity (2.6) follows in the finite case. Identity (2.8) and its continuous counterpart will play a crucial role in the sequel. As it turns out, they are more suited to our cause under the following form.

Lemma 2.4 (A probabilistic Lagrange identity).

Fix some integer and introduce the (column) vector . Also let be any function such that for all .Then

 E[v(X)g(X)Φℓp(u,X,v)]E[v′(X)g(X)Φℓp(u,X,v)] =E[v(X)v′(X)Φℓp(u,X,v)]E[g2(X)Φℓp(u,X,v)]−Rℓ(u,v;v,g), (2.9)

where is the matrix given by

 Rℓ(u,v;v,g)=E[(v3g4−v4g3)(v3g4−v4g3)′Φℓp(u,X3,X4,v)] (2.10)

with

 Φℓp(u,x1,x2,v)=χℓ(u,x1)χℓ2(x1,x2)χ−ℓ(x2,v)p(x1)p(x2). (2.11)

Here denote two independent copies of and so that , and , . When the context is clear, we abbreviate .

2.2 Papathanasiou-type expansion

Now the necessary ingredients are available to give the main result of this paper. We use the notation that for a vector of functions, the operator operates on each component, so that .

Theorem 2.5.

Fix and let be a sequence such that for all if , otherwise arbitrarily chosen. Let be a sequence of real valued functions such that for all . Starting with some function , we recursively define the sequence by and for all . For any sequence we let and

 Φℓn(x1,x3,…,x2n−1,x2n+1,x2n+2,x2n,…,x2) =1∏2n+2i=3p(xi)χℓ2(x2n+1,x2n+2)n∏i=1χℓi(x2i−1,x2i+1)χ−ℓi(x2i+2,x2i). (2.12)

Then, for all vectors of functions such that the expectations below exist, and all , we have

 (2.13)

where the derivatives are taken component-wise, and the weight sequences are

 Γℓkh(x)=E[(hk(X2k)−hk(X2k−1))Φℓkp(x2k−1,x,x2k)Φℓk−1(X1,…,X2k−1,X2k,…,X2) k−1∏i=1Δ−ℓihi(X2i+1,X2i+2)] (2.14)

and

 Rℓn(h) =E[(fn(X2n+2)−fn(X2n+1))(fn(X2n+2)−fn(X2n+1))′ Φℓn(X1,…X2n+1,X2n+2,…,X2)n∏i=1Δ−ℓihi(X2i+1,X2i+2)] (2.15)

where and an empty product is set to 1.

Remark 2.6.

If as then, under the conditions of Theorem 2.5,

 Cov[f(X)]=∞∑k=1(−1)k−1E[Δ−ℓkfk−1(X)Δ−ℓkf′k−1(X)Γℓkh(X)Δ−ℓkhk(X)]. (2.16)

In particular when is a th-degree polynomial, then vanishes for and (2.13) is an exact expansion of the variance in (2.13) with respect to the functions ().

Remark 2.7.

A stronger sufficient condition on the functions is that they be strictly increasing throughout , in which case the condition is guaranteed. Under this assumption, the matrix defined in (2.15) is non-negative definite so that, in particular, taking for all and fixing we recover the expansion (1.6) as stated in the Introduction.

Remark 2.8.

When then the condition that is itself also too restrictive because, as will have been made clear in the proof (see the Appendix), the recurrence only implies that needs to be positive on some interval where and are positive integers (they will be properly defined in (3.7)). In particular when the sequence necessarily stops if is bounded, since after a certain number of iterations the indicator functions defining will be 0 everywhere.

Suppose that the assumption of Remark 2.7 applies, so that the remainder is non negative definite. Then, taking in (2.13) gives an upper bound, and taking gives a lower bound, on the covariance, and the following holds (stated again in the case , for the sake of clarity).

Corollary 2.9.

Let all the conditions in Theorem 2.5 prevail for . Then

 E⎡⎣Δ−ℓ1f(X)Δ−ℓ1g(X)Γℓ11h1(X)Δ−ℓ1h1(X)⎤⎦−E⎡⎣Δ−ℓ2(Δ−ℓ1f(X)Δ−ℓ1h(X))Δ−ℓ2(Δ−ℓ1f(X)Δ−ℓ1h(X))Γℓ1,ℓ22(h1,h2)(X)Δ−ℓ2h2(X)⎤⎦ ≤Cov[f(X),g(X)]≤E⎡⎣Δ−ℓ1f(X)Δ−ℓ1g(X)Γℓ11h1(X)Δ−ℓ1h1(X)⎤⎦.
Remark 2.10.

When , the upper bound for is a weighted Poincaré inequality of the same essence as the upper bound provided in [22] (as revisited in [14]), whereas the lower bound obtained with is of a different flavour.

Of course such identities and expansions are only useful if the weights are of a manageable form. This is exactly the topic of the next section.

3 About the weights in Theorem 2.5

The crucial quantities in Theorem 2.5 are the sequences of weights defined in (2.14). For , the expression are straightforward to obtain (see equations (3.4) for the continuous case and (3.8) for the discrete case ). For larger the situation is not so straightforward. Relevance of the higher order terms in the covariance expansions (2.13) then hinges on the tractability of these weights, which itself depends on the choice of functions . In this section we restrict attention to the (natural) choice for all . Then, writing instead of we can express the sequence of weights as where, for all , we set

 γℓkh(x1,x,x2) =E[(h(X2k)−h(X2k−1))Φℓkp(X2k−1,x,X2k)Φℓk−1(x1,X3…,X2k−1,X2k,…,x2) k−1∏i=1Δ−ℓih(X2i+1,X2i+2)]. (3.1)

We now study (3.1) and the resulting expressions for the weights under different sets of assumptions.

3.1 General considerations

When no specific assumptions are made on or , we find it easier to separate the continuous case (i.e. ) from the discrete one (i.e. ).

3.1.1 The continuous case

The continuous case is quite easy as (2.12) simplifies when all the test functions are equal and the expressions follow directly from the structure of the weight sequence, which turn out to be straightforward iterated integrals. We note that such iterated integrals have a structure which may be of independent interest; all details are provided in the Appendix.

Lemma 3.1.

Fix and let be non-decreasing. Then for all ,

 γ0kh(x1,x,x2) =(h(x)−h(x1))k−1(h(x2)−h(x))k−1(h(x2)−h(x1))I[x1≤x≤x2]p(x)k!(k−1)! (3.2)

and

 Γ0kh(x)=1k!(k−1)!1p(x)E[(h(x)−h(X1)k−1(h(X2)−h(x))k−1(h(X2)−h(X1))I[X1≤x≤X2]]. (3.3)

Specific instantiations for different explicit distributions are given in Section 3.3. We nevertheless note that, letting denote the mean we get

 Γ01h(x)=1p(x)E[(h(X2)−h(X1))I[X1≤x≤X2]]=1p(x)E[(ν(h)−h(X))I[x≤X]] (3.4)

which one may recognize as the inverse of the canonical Stein operator (see (3.10)); in particular taking the identity function, (3.4) yields the Stein kernel. For more information on the connection with Stein’s operators, see Section 3.1.3.

3.1.2 The discrete case

In the discrete case, simplifications of are more difficult as (2.12) depends strongly on the chosen sequence . Let . Recall the notations in (2.1) and set , for . Applying the definitions leads to

 γℓ11h(x1,x,x2)=(h(x2)−h(x1))I[x1+a1≤x≤x2−b1]p(x) (3.5) γℓ1,ℓ22h(x1,x,x2)=x−a2∑x3=x1+a1x2−b1∑x4=x+b2(h(x4)−h(x3))Δ−ℓ1h(x3,x4)I[x1+a1+a2≤x≤x2−b1−b2]p(x). (3.6)

In order to generalize to arbitrary , we introduce

 ak=k∑i=1ai and bk=k∑i=1bi. (3.7)

Note that counts the number of “” in the first components of and counts the corresponding number of “”, so that . Then for we have (sums over empty sets are set to 1):

 γℓkh(x1,x,x2) =⎛⎝x−ak∑x3=x1+ak−1x2−bk−1∑x4=x+bk(h(x4)−h(x3))Δ−ℓk−1h(x3,x4)x3−ak−1∑x5=x1+ak−2x2−bk−2∑x6=x4+bk−1Δ−ℓk−2h(x5,x6) ⋯x2k−3−a2∑x2k−1=x1+a1x2−b1∑x2k+1=x2k−2+b2Δ−ℓ1h(x2k−1,x2k)⎞⎠I[x1+ak≤x≤x2−bk]p(x)

for all and all . This is a proof of the next result.

Proposition 3.2.

Instate all previous notations. For all ,

 γℓkh(x1,x,x2) =⎛⎝x−ak∑x3=x1+ak−1x2−bk−1∑x4=x+bk(h(x4)−h(x3))ψℓk−1h(x1,x3,x4,x2)⎞⎠I[x1+ak≤x≤x2−bk]p(x)

where and, for , and

 ψℓk−1,1h(x1,x3)=Δ−ℓk−1h(x3)x3−ak−1∑x5=x1+ak−2⎛⎝Δ−ℓk−2h(x5)x5−ak−2∑x7=x1+ak−4⎛⎝⋯x2k−3−a2∑x2k−1=x1+a1Δ−ℓ1h(x2k−1)⎞⎠⎞⎠ ψℓk−1,2h(x4,x2)=Δ−ℓk−1h(x4)x2−bk−2∑x6=x4+bk−1⎛⎝Δ−ℓk−2h(x6)x2−bk−3∑x8=x6+bk−2⎛⎝⋯x2−b1∑x2k=x2k−2+b2Δ−ℓ1h(x2k)⎞⎠⎞⎠

for all .

Taking expectations in (3.5) and (3.6) we obtain

 Γℓ11h(x)=1p(x)E[(h(X2)−h(X1))I[X1+a1≤x≤X2−b1]] (3.8) Γℓ1,ℓ22h(x)=1p(x)E⎡⎣x−a2∑x3=X1+a1X2−b1∑x4=x+b2(h(x4)−h(x3))Δ−ℓ1h(x3,x4)I[X1+a2≤x≤X2−b2]⎤⎦. (3.9)

The expressions for higher orders are easy to infer, but this seems to be the best we can do because the expressions in Proposition 3.2 are obscure and, unfortunately, we have not been able to devise a formula as transparent as (3.2) for general in the discrete case. Nevertheless, simple manageable expressions are obtainable for certain specific choices of , particularly the case as we shall see in Section 3.2.

3.1.3 Connection with Stein operators

In [14] we introduced the canonical inverse Stein operator

 Lℓph(x)=E[(h(X1)−h(X2))Φℓp(X1,x,X2)] (3.10)

for and independent copies of . This operator has the property of yielding solutions to so-called Stein equations, both in discrete and continuous setting; it has many important properties within the context of Stein’s method. In particular it provides generalized covariance identities and, when is the identity function, it provides

 τℓp(x)=−LℓpId(x) (3.11)

the all-important Stein kernel of . This function, first introduced in [34], has long been known to provide a crucial handle on the properties of and is now studied as an object of intrinsic interest, see e.g. [11, 16].

From (3.4) and (3.8), we immediately recognize that , in other words the first order weight in our expansion is given by a Stein operator. There is also a connection between and “higher order” Stein kernels. To see this, restrict to the continuous case and introduce . Then (3.2) becomes

 Γ0kh(x) =(−1)k(E[Hk−1x(X)]L0pHkx(x)−E[Hkx(X)]L0pHk−1x(x)) (3.12)

(see the Appendix for a proof). In the case the expression (3.12) simplifies to Papathanasiou’s weights from (1.4). This allows to make the connection between considerations related to Stein’s method and the weights appearing in the expansions, as has already been observed (see e.g. [4]). We do not pursue this line of research here, except to point out that our result provides a framework to the important works [31, 24, 21, 4, 1], which focus on particular families of distributions, see Sections 3.3.1 and 3.3.2. Further study of this connection, in line e.g. with [15], is outside the scope of this paper and deferred to a future publication.

3.2 Handpicking the test functions

We now focus on particular choices of . To begin with, we consider the most intuitive choice (and the only one studied in the literature): . In this case we abbreviate