1 Introduction
The starting point of this paper is the famous Gaussian expansion which states that if , then
(1.1) 
for all smooth functions such that all the expectations exist. Expansion (1.1), whose first order term yields an upper variance bound generalizing Chernoff’s famous Gaussian bound from [10], has been obtained in a number of different (and often non equivalent) ways. It is proved in [18] via orthogonality properties of Hermite polynomials, and extensions to multivariate and infinite dimensional settings are given in [19, 20]. Chen uses martingale and stochastic integrals to obtain a general version of (1.1) (also valid on certain manifolds) in [9]. The expansion is contextualized in [25] through properties of the OrnsteinUhlenbeck operator, and it is also shown in that paper that the semigroup arguments carry through to nonGaussian target distributions under general assumptions. A very general approach to this line of research can be found in [20]
where similar expansions are obtained by means of an iteration of an interpolation formula for infinitely divisible distributions. The main difference between the univariate standard Gaussian and the general nonGaussian target is that the explicit weight sequence and simple iterated derivatives appearing in (
1.1) need to be replaced by some wellchosen iterated gradients with weight sequences which can be quite difficult to obtain explicitly (for instance Ledoux’ sequence from [25] is an iteration of the “carré du champ” operator).The above references are predated by [31] wherein a general version of (1.1) (valid for arbitrary continuous target distributions) is obtained through elementary arguments relying on an iteration of the exact CauchySchwarz equality (via the socalled Mohr and Noll identity from [29]) combined with the Lagrange identity for integrals due to [7]. Papathanasiou’s method of proof is extended in [4] to encompass discrete distributions. Both the continuous and discrete expansions are of the same form as (1.1), although the weight sequence is replaced with a targetspecific explicit sequence of weights (see equations (1.4) and (2) below). To set the scene, we borrow notation from [14] which allows to unify the presentation of the results from [31] and [4] and shall be used throughout this paper.
Notation: For a function let for all , with the convention that , with the weak derivative defined Lebesgue almost everywhere. The case is referred to as the continuous case and is referred to as the discrete case. For a realvalued function , in the continuous case denotes its derivative; discrete higher order derivatives are obtained by iterating the forward derivative . We use the rising and falling factorial notation
(1.2) 
with the convention that .
Expansion (1.1) can then be seen as a particular instance of the following result (see [31, Theorem 1 and Corollary 1] and [4, Theorem 3.1]).
Theorem 1.1 (Papathanasiou’s expansion).
Let
be a random variable with finite
moments. Let be a realvalued function with finite variance with respect to . Then(1.3) 
where is a nonnegative remainder term and depend on the type of distribution, as follows.

If
is a real random variable with continuous probability density function (pdf)
, then the weights are(1.4) defined for all such that .

If is an integervalued r.v. with probability mass function (pmf) , then the weights are
(1.5) defined for all such that .
It is not hard to show that when , the weight sequence (1.4) simplifies to so that (1.3) indeed contains (1.1). More generally, it is shown in [21] that if belongs to the Integrated Pearson (IP) system of distributions (see Definition 3.6) then the weights take on a particularly agreeable form, namely and (which is constant if is Integrated Pearson); many familiar univariate distributions belong to the IP system, such as the normal, beta, gamma, and Student distributions. Similarly as in the continuous case, it is shown by [4, Corollary 4.1] that if belongs to the cumulative Ord family with parameter defined in Definition 3.10, then the weights in (2) are
. Like its continuous counterpart, the discrete IP system also contains many familiar univariate distributions such as the binomial, Poisson and geometric distributions.
The list of references presented so far is anything but exhaustive and expansions inspired from (1.1) have attracted a lot of attention over the years, e.g. with extensions to matrix inequalities as in [30, 36, 2], to stable distributions [23]
, to Bernoulli random vectors
[6]; more references shall be provided in the text. Aside from their intrinsic interest, they have many applications and are closely connected to a wide variety of profound mathematical questions. For statistical inference purposes, they can be used in the study of the variance of classes of estimators (see e.g.
[4, section 5]), of copulas ([12]), for problems related to superconcentration ([8] and [35]) or for the study of correlation inequalities [20] and [5]. These expansions can also interpreted as refined logSobolev, Poincaré or isoperimetric inequalities, see [33]. The weights appearing in the first order () bounds are crucial quantities in Stein’s method [16, 26]and their higher order extensions are closely connected to eigenvalues and eigenfunctions of certain differential operators
[9].In the present paper, we combine the method from [31, 4] with intuition from [22] (and our recent work [14]) to unify and extend the results from Theorem 1.1 to arbitrary targets under very weak assumptions. The result is given in Theorem 2.5 and can be briefly sketched in a simplified form as follows. Fix a sequence either in or and let be such that for all . Starting with some functions , we recursively define the sequence (resp., ) by (resp., ) and (resp., ) for all . Then, for all , it holds that if the expectations below are finite then
(1.6) 
where the weight sequences as well as the nonnegative remainder term are given explicitly (see Theorem 2.5) and in many cases have a simple form (see Section 3). The expansions from Theorem 1.1 are recovered by setting , and (the identity function) and, in the discrete case, . Far from obscuring the message, expansion (1.6), and its more general form provided in Theorem 2.5, shed new light on the expansion (1.3) and its available extensions by bringing a new interpretation to the weight sequences in terms of explicit iterated integrals and sums. This is the topic of Section 3. Our results also inscribe the topic within a context which is familiar to practitioners of the famous Stein’s method. This last connection nevertheless remains slightly mysterious and will be studied in detail in future contributions.
The paper is organised as follows. In Section 2 we provide the main results in their most abstract form. After setting up the notations (inherited mainly from [14]), Section 2.3 contains the crucial Lagrange identity (Lemma 2.4) and Section 2.2 contains the Papathanassioutype expansion (Theorem 2.5). In Section 3 we provide illustrations by rewriting the weights appearing in Theorem 2.5 under different sets of assumptions. First, in Section 3.1 we consider a general weighting function ; next, in Section 3.2 we choose certain specific intuitively attractive functions (namely the identity, the cdf and the score); finally in Section 3.3 we obtain explicit expressions for various illustrative distributions (here in particular the connection with existing literature on the topic is also made). For the sake or readability, all proofs are relegated to an Appendix.
2 Infinite matrixcovariance expansions
We begin this paper by recalling some elements of the setup from our paper [14]. Let and equip it with some algebra and finite measure . Let be a random variable on , with probability measure which is absolutely continuous with respect to ; we denote the corresponding probability density, and its support by . As usual, is the collection of all real valued functions such that . Although we could in principle keep the discussion to come very general, in order to make the paper more concrete and readable in the sequel we shall restrict our attention to distributions satisfying the following Assumption.
Assumption A. The measure is either the counting measure on or the Lebesgue measure on . If is the counting measure then there exist such that . If is the Lebesgue measure then there exist such that .
We denote the collection of functions such that exists and is finite almost surely on . If , this corresponds to all absolutely continuous functions; if the domain is the collection of all functions on . Let . Still following [14] we also define
(2.1) 
as well as the generalized indicator function
(2.2) 
which is defined with the obvious strict inequalities also for and , and
(2.3) 
for all (note that for ). The following result is immediate but useful:
Lemma 2.1.
For all , it holds that Moreover,
(2.4) 
We conclude with another result from [14]; this results motivates the covariance expansion in Theorem 2.5.
Lemma 2.2.
If is such that is integrable on then,
(2.5) 
If, furthermore, then
2.1 A probabilistic Lagrange inequality
The first ingredient for our results is the following covariance representation (recall that all proofs are in the Appendix).
Lemma 2.3.
Let with support . If are independent copies of then
(2.6)  
(2.7) 
for all .
A simple representation such as (2.6) is obviously not new, per se; see e.g. the variance expression in [28, page 122]. In fact, treating the discrete and continuous cases separately, one could also obtain identity (2.6) as a direct application of Lagrange’s identity (a.k.a. the CauchySchwarz inequality with remainder) which reads, in the finite discrete case, as
(2.8) 
Using and for , identity (2.6) follows in the finite case. Identity (2.8) and its continuous counterpart will play a crucial role in the sequel. As it turns out, they are more suited to our cause under the following form.
Lemma 2.4 (A probabilistic Lagrange identity).
Fix some integer and introduce the (column) vector . Also let be any function such that for all .Then
(2.9) 
where is the matrix given by
(2.10) 
with
(2.11) 
Here denote two independent copies of and so that , and , . When the context is clear, we abbreviate .
2.2 Papathanasioutype expansion
Now the necessary ingredients are available to give the main result of this paper. We use the notation that for a vector of functions, the operator operates on each component, so that .
Theorem 2.5.
Fix and let be a sequence such that for all if , otherwise arbitrarily chosen. Let be a sequence of real valued functions such that for all . Starting with some function , we recursively define the sequence by and for all . For any sequence we let and
(2.12) 
Then, for all vectors of functions such that the expectations below exist, and all , we have
(2.13) 
where the derivatives are taken componentwise, and the weight sequences are
(2.14) 
and
(2.15) 
where and an empty product is set to 1.
Remark 2.6.
Remark 2.7.
A stronger sufficient condition on the functions is that they be strictly increasing throughout , in which case the condition is guaranteed. Under this assumption, the matrix defined in (2.15) is nonnegative definite so that, in particular, taking for all and fixing we recover the expansion (1.6) as stated in the Introduction.
Remark 2.8.
When then the condition that is itself also too restrictive because, as will have been made clear in the proof (see the Appendix), the recurrence only implies that needs to be positive on some interval where and are positive integers (they will be properly defined in (3.7)). In particular when the sequence necessarily stops if is bounded, since after a certain number of iterations the indicator functions defining will be 0 everywhere.
Suppose that the assumption of Remark 2.7 applies, so that the remainder is non negative definite. Then, taking in (2.13) gives an upper bound, and taking gives a lower bound, on the covariance, and the following holds (stated again in the case , for the sake of clarity).
Corollary 2.9.
Let all the conditions in Theorem 2.5 prevail for . Then
Remark 2.10.
Of course such identities and expansions are only useful if the weights are of a manageable form. This is exactly the topic of the next section.
3 About the weights in Theorem 2.5
The crucial quantities in Theorem 2.5 are the sequences of weights defined in (2.14). For , the expression are straightforward to obtain (see equations (3.4) for the continuous case and (3.8) for the discrete case ). For larger the situation is not so straightforward. Relevance of the higher order terms in the covariance expansions (2.13) then hinges on the tractability of these weights, which itself depends on the choice of functions . In this section we restrict attention to the (natural) choice for all . Then, writing instead of we can express the sequence of weights as where, for all , we set
(3.1) 
We now study (3.1) and the resulting expressions for the weights under different sets of assumptions.
3.1 General considerations
When no specific assumptions are made on or , we find it easier to separate the continuous case (i.e. ) from the discrete one (i.e. ).
3.1.1 The continuous case
The continuous case is quite easy as (2.12) simplifies when all the test functions are equal and the expressions follow directly from the structure of the weight sequence, which turn out to be straightforward iterated integrals. We note that such iterated integrals have a structure which may be of independent interest; all details are provided in the Appendix.
Lemma 3.1.
Fix and let be nondecreasing. Then for all ,
(3.2) 
and
(3.3) 
Specific instantiations for different explicit distributions are given in Section 3.3. We nevertheless note that, letting denote the mean we get
(3.4) 
which one may recognize as the inverse of the canonical Stein operator (see (3.10)); in particular taking the identity function, (3.4) yields the Stein kernel. For more information on the connection with Stein’s operators, see Section 3.1.3.
3.1.2 The discrete case
In the discrete case, simplifications of are more difficult as (2.12) depends strongly on the chosen sequence . Let . Recall the notations in (2.1) and set , for . Applying the definitions leads to
(3.5)  
(3.6) 
In order to generalize to arbitrary , we introduce
(3.7) 
Note that counts the number of “” in the first components of and counts the corresponding number of “”, so that . Then for we have (sums over empty sets are set to 1):
for all and all . This is a proof of the next result.
Proposition 3.2.
Instate all previous notations. For all ,
where and, for , and
for all .
Taking expectations in (3.5) and (3.6) we obtain
(3.8)  
(3.9) 
The expressions for higher orders are easy to infer, but this seems to be the best we can do because the expressions in Proposition 3.2 are obscure and, unfortunately, we have not been able to devise a formula as transparent as (3.2) for general in the discrete case. Nevertheless, simple manageable expressions are obtainable for certain specific choices of , particularly the case as we shall see in Section 3.2.
3.1.3 Connection with Stein operators
In [14] we introduced the canonical inverse Stein operator
(3.10) 
for and independent copies of . This operator has the property of yielding solutions to socalled Stein equations, both in discrete and continuous setting; it has many important properties within the context of Stein’s method. In particular it provides generalized covariance identities and, when is the identity function, it provides
(3.11) 
the allimportant Stein kernel of . This function, first introduced in [34], has long been known to provide a crucial handle on the properties of and is now studied as an object of intrinsic interest, see e.g. [11, 16].
From (3.4) and (3.8), we immediately recognize that , in other words the first order weight in our expansion is given by a Stein operator. There is also a connection between and “higher order” Stein kernels. To see this, restrict to the continuous case and introduce . Then (3.2) becomes
(3.12) 
(see the Appendix for a proof). In the case the expression (3.12) simplifies to Papathanasiou’s weights from (1.4). This allows to make the connection between considerations related to Stein’s method and the weights appearing in the expansions, as has already been observed (see e.g. [4]). We do not pursue this line of research here, except to point out that our result provides a framework to the important works [31, 24, 21, 4, 1], which focus on particular families of distributions, see Sections 3.3.1 and 3.3.2. Further study of this connection, in line e.g. with [15], is outside the scope of this paper and deferred to a future publication.
3.2 Handpicking the test functions
We now focus on particular choices of . To begin with, we consider the most intuitive choice (and the only one studied in the literature): . In this case we abbreviate