# On a formula for moments of the multivariate normal distribution generalizing Stein's lemma and Isserlis theorem

We prove a formula for the evaluation of averages containing a scalar function of a Gaussian random vector multiplied by a product of the random vector components, each one raised at a power. Some powers could be of zeroth-order, and, for averages containing only one vector component to the first power, the formula reduces to Stein's lemma for the multivariate normal distribution. Also, by setting the said function inside average equal to one, we easily derive Isserlis theorem and its generalizations, regarding higher order moments of a Gaussian random vector. We provide two proofs of the formula, with the first being a rigorous proof via mathematical induction. The second is a formal, constructive derivation based on treating the average not as an integral, but as the action of pseudodifferential operators defined via the moment-generating function of the Gaussian random vector.

03/10/2021

08/01/2021

### The restrictiveness of the hazard rate order and the moments of the maximal coordinate of a random vector uniformly distributed on the probability n-simplex

Continuing the work of [9] who defined the restrictiveness of stochastic...
05/13/2019

### Moment Identifiability of Homoscedastic Gaussian Mixtures

We consider the problem of identifying a mixture of Gaussian distributio...
10/24/2017

### On the Conditional Distribution of a Multivariate Normal given a Transformation - the Linear Case

We show that the orthogonal projection operator onto the range of the ad...
05/27/2019

### Gaussian Approximations for Maxima of Random Vectors under (2+ι)-th Moments

We derive a Gaussian approximation result for the maximum of a sum of ra...
02/10/2022

### Lebesgue Induction and Tonelli's Theorem in Coq

Lebesgue integration is a well-known mathematical tool, used for instanc...
02/23/2021

### Transfer function interpolation remainder formula of rational Krylov subspace methods

Rational Krylov subspace projection methods are one of successful method...

## 1 Introduction and main results

Stein’s lemma, [23], [11], is a well-known identity of the multivariate normal distribution, with applications in statistics, see e.g. [25], [16]. Stein’s lemma is stated as follows.

###### Theorem 1 (Multivariate Stein’s lemma).

For an -dimensional Gaussian random vector , with mean value vector and autocovariance matrix , it holds true that

 E[g(X)Xm]=μmE[g(X)]+N∑j=1CmjE[∂g(X)∂Xj],  m=1,…,N, (1)

where denotes the average operator, and is the partial derivative of with respect to component.

###### Proof.

In Appendix A, we prove Stein’s lemma using integration by parts. ∎

By Stein’s lemma (1), the presence of th component inside the average is eliminated, with being expressed in terms of the averages of , and the averages of its first-order partial derivatives with respect to each of the components of . The topic of the present work is to generalize Stein’s lemma, providing a formula of expressing the average in terms of the averages of partial derivatives of . In the recent work [12], we generalized the counterpart of Eq. (1) for the univariate normal distribution, expressing the average , where

is a scalar Gaussian random variable, in terms of the averages of the derivatives of scalar function

, up to the th order.

Furthermore, the formula generalizing Stein’s lemma for has to be compatible with identities expressing higher order moments of a Gaussian random vector in terms of its first two moments (mean value vector and autocovariance matrix). The central result in this direction is Isserlis theorem [8], also known in physics literature as Wick’s theorem [26], on the moments of a zero-mean value Gaussian random vector. For a review of the related literature see also [24]. Recently, Song and Lee in [22] derived a formula in closed, compact form for the product Gaussian moments . Subsequently, in corollaries 4, 5, Isserlis and Song & Lee formulas for Gaussian moments are easily rederived from our generalizing formula (2), by setting .

###### Theorem 2 (Generalizing formula).

Let be an -dimensional Gaussian random vector with mean value vector and autocovariance matrix . The diagonal elements of matrix (the autocovariances of each component) are denoted as . For a smooth enough function , for a given with , , and under the assumption that all averages involved exist, it holds true that:

 E[g(X)(N∏i=1Xnii)]=∑ri+∑Nj=1ℓij=nii=1,…,N[N∏i=1(niri,ℓi1,…,ℓiN)](N∏i=1μrii)× ×⎛⎝N∏i=1⌊ℓii/2⌋∑kii=0Hℓii,kiiσ2(ℓii−kii)i⎞⎠⎛⎝N∏i=1N∏j>imin{ℓij,ℓji}∑kij=0Gℓij,ℓji,kijCℓij+ℓji−kijij⎞⎠E[N∏i=1∂aiig(X)], (2)

where , is the floor function, and is the multinomial coefficient with factors. In Eq. (2), the orders of partial derivatives are defined as

 ai=N∑j=1[ℓji−(1+δij)kij],  % with  kij=kji, (3)

and being Kronecker’s delta; , for . Coefficients , are

 Hℓ,k=ℓ!2kk!(ℓ−2k)!,  k=0,…,⌊ℓ/2⌋, (4)

and

 Gℓ1,ℓ2,k=(ℓ1k)(ℓ2k)k!,  k=0,…,min{ℓ1,ℓ2}, (5)

where is the binomial coefficient. Sum is over all combinations of nonnegative integers with , .

###### Proof.

In Sec. 2, we prove Eq. (2) rigorously, by multidimensional mathematical induction on . In addition to this proof, and in order to provide the reader with more insight, we present, in Sec. 3, a constructive formal proof of theorem 2. This constructive proof is based on treating the mean value operator not as an integral, but as the sequential action of a number of pseudodifferential operators that are introduced in definition 1 via the moment-generating function of the Gaussian random vector. The action of these pseudodifferential operators is determined by their Taylor series expansions, under the formal assumption that all infinite series involved are summable. As we see in the proofs of lemmata 4, 5, coefficients and arise naturally in the constructive proof, when evaluating the terms appearing in the series expansions of the pseudodifferential operators. ∎

###### Remark 1 (On the orders ai of partial derivatives).

By Eq. (3), we observe that the order of the partial derivative with respect to in the right-hand side of Eq. (2) depends on: i) , that belongs to the partition of power of the same in the left-hand side of Eq. (2), and ii) all , , that each one of them belongs to the partition of power of one of the rest , . The second dependence of each is a consequence of the correlation between the components of the Gaussian random vector ; in corollary 2 for the uncorrelated case, we shall see that each depends on only.

###### Remark 2 (On the coefficients Hℓ,k, Gℓ1,ℓ2,k).

By the combinatorial interpretation of binomial coefficients, see e.g. [4, p. 110], coefficients , defined by Eq. (5), are identified as the number of ways to put together elements from a set of elements, with elements from another set of elements. In [10, p. 62], , defined by Eq. (4), are identified as the number of partitions of a set of elements into unordered pairs and singletons. Also, by virtue of [1, expression 22.3.11], are identified as the absolute values of the coefficients appearing in the th order probabilist’s Hermite polynomial; . Last, in the On-Line Encyclopedia of Integer Sequences [17], are referred to as the Bessel numbers.

We shall now review simplifications of formula (2) for some specific cases:

###### Corollary 1 (Formula (2) for zero-mean valued X).

 E[g(X)(N∏i=1Xnii)]=∑∑Nj=1ℓij=nii=1,…,N[N∏i=1(niℓi1,…,ℓiN)]⎛⎝N∏i=1⌊ℓii/2⌋∑kii=0Hℓii,kiiσ2(ℓii−kii)i⎞⎠× ×⎛⎝N∏i=1N∏j>imin{ℓij,ℓji}∑kij=0Gℓij,ℓji,kijCℓij+ℓji−kijij⎞⎠E[N∏i=1∂aiig(X)]. (6)

We show how to apply the generalizing formula by deriving the following example.

###### Example 1.

For , and , Eq. (1) results in

 E[g(X1,X2)X1X22]=(σ21σ22+2C212)E[∂g(X1,X2)∂X1]+3σ22C12E[∂g(X1,X2)∂X2]+ +σ21C212[∂3g(X1,X2)∂X31]+σ42C12[∂3g(X1,X2)∂X32]. (7)
###### Proof.

The derivation of Eq. (1) from Eq. (1) is performed in Appendix B. ∎

###### Corollary 2 (Formula (2) for uncorrelated Xi).

Consider the case where the components of are uncorrelated: for . Thus, in Eq. (2), for , and so the first summation in its right-hand side is simplified into . Now, by defining , and expressing , the said summation is simplified into , which can be denoted in a contracted way by using multi-index notation, see e.g. [20, p. 319]. By introducing the -dimensional multi-indices , , , and denoting , Eq. (2) for the uncorrelated case reads

 E[g(X)Xn]=∑ℓ≤n(nℓ)μn−ℓ∑k≤⌊ℓ/2⌋Hℓ,kσ2(ℓ−k)E[∂ℓ−2kg(X)]. (8)

In Eq. (8), the partial ordering of multi-indices implies that , for . Furthermore , , , , , and . For , Eq. (8) coincides with the extension of Stein’s lemma for the scalar case, that we derived recently in [12].

In the following corollary, we see that Stein’s lemma 1 is a special case of Eq. (2).

###### Corollary 3 (Rederivation of Stein’s lemma).

For the average , the first summation in the right-hand side of Eq. (2) is over all sets , where for all and for all subsets with one integer being equal to 1 and all the rest being equal to 0. Since the multinomial coefficient corresponding to each of these integer combinations equals to 1, , and by using the symmetry property of autocovariance matrix , Eq. (2) for reads

 E[g(X)Xm]=μmE[g(X)]+σ2mE[∂mg(X)]+N∑j=1j≠mCmjE[∂jg(X)]. (9)

Under the notation , Eq. (9) results into Eq. (1).

We shall now derive Isserlis theorem [8] and Song & Lee formula [22] from formula (2), by setting in it, and identify the nonzero terms in its right-hand side. For , the nonzero terms in the right-hand side of Eq.  (2), are only those that contain the zeroth order derivatives of ; all for .

###### Corollary 4 (Isserlis theorem).

The higher order moments of an -dimensional Gaussian random vector with zero mean value are expressed in terms of its autocovariance matrix as

 E[N∏i=1Xi]={0% for N,∑P∈℘2N∏{i,j}∈PCijfor N even, (10)

with being the set of all partitions of into unordered pairs.

###### Proof.

Eq. (10) is proven in Isserlis work [8]. In Appendix C, we derive Eq. (10) from formula (2), by setting , for , and . ∎

###### Corollary 5 (Product moment formula for Gaussian vectors).

As a generalization of Isserlis theorem 4, and for a Gaussian random vector whose mean value is non-zero in general, the following formula for its product moments holds true:

 E[N∏i=1Xnii]=∑m∈Sndn,m(N∏i=1N∏j=iCmijij)(N∏i=1μrii), (11)

where

 ri=ni−N∑j=1(1+δij)mij, (12)

and with coefficients defined as

 dn,m=(∏Ni=1ni!)2∑Ni=1mii(∏Ni=1∏Nj=imij!)(∏Ni=1ri!). (13)

In Eq. (11), summation is over the set of all with , for which , are nonnegative.

###### Proof.

Eq. (11) has been proven by Song and Lee in [22], using results from the work of Price [19]. In Appendix D, we derive Eq. (11) by setting in formula (2), as we have done for the derivation of Isserlis theorem 4. ∎

## 2 Proof of theorem 2 by mathematical induction

Theorem 2 for can be proven by multidimensional mathematical induction on the multi-index of exponents . As the base case, we choose . As we have seen in corollary 3, Eq. (2) for results in Stein’s lemma, Eq. (1), which is proven by theorem 1. Our inductive hypothesis is that Eq. (2) holds true for . Then, we have to prove Eq. (2) for . Thus, we have to prove Eq. (2) for augmented by 1, for every . In this section, we prove Eq. (2) for being augmented by 1. The proof for the rest is similar.

 E[g(X)(Xn1+11N∏i=2Xnii)]=E[(g(X)X1)(N∏i=1Xnii)],

and by using the inductive hypothesis:

 E[g(X)(Xn1+11N∏i=2Xnii)]=∑ri+∑Nj=1ℓij=nii=1,…,N[N∏i=1(niri,ℓi1,…,ℓiN)](N∏i=1μrii)× (14)

By the general Leibniz rule [1, expression 3.3.8], we calculate the derivative inside the average in the right-hand side of Eq. (2) as

 ∂a11(g(X)X1)=a1∑p=0(a1p)(∂a1−p1g(X))(∂p1X1). (15)

Since , , and for , Eq. (15) is simplified into

 ∂a11(g(X)X1) =(∂a11g(X))X1+((ℓ11−2k11)+N∑m=2(ℓm1−k1m))∂a1−11g(X). (16)

By using Eq. (2), we rewrite Eq. (2) as

 E[g(X)(Xn1+11N∏i=2Xnii)]=A+N∑m=1Bm, (17)

where

 A=∑ri+∑Nj=1ℓij=nii=1,…,N[N∏i=1(niri,ℓi1,…,ℓiN)](N∏i=1μrii)⎛⎝N∏i=1⌊ℓii/2⌋∑kii=0Hℓii,kiiσ2(ℓii−kii)i⎞⎠× ×⎛⎝N∏i=1N∏j>imin{ℓij,ℓji}∑kij=0Gℓij,ℓji,kijCℓij+ℓji−kijij⎞⎠E[(N∏i=1∂aiig(X))X1], (18)
 B1=∑ri+∑Nj=1ℓij=nii=1,…,N[N∏i=1(niri,ℓi1,…,ℓiN)](N∏i=1μrii)⎛⎝⌊(ℓ11−1)/2⌋∑k11=0(ℓ11−2k11)Hℓ11,k11σ2(ℓ11−k11)1⎞⎠× ×⎛⎝N∏i=2⌊ℓii/2⌋∑kii=0Hℓii,kiiσ2(ℓii−kii)i⎞⎠⎛⎝N∏i=1N∏j>imin{ℓij,ℓji}∑kij=0Gℓij,ℓji,kijCℓij+ℓji−kijij⎞⎠E[∂a1−11N∏i=2∂aiig(X)], (19)
 Bm=∑ri+∑Nj=1ℓij=nii=1,…,N[N∏i=1(niri,ℓi1,…,ℓiN)](N∏i=1μrii)⎛⎝N∏i=1⌊ℓii/2⌋∑kii=0Hℓii,kiiσ2(ℓii−kii)i⎞⎠× ×⎛⎝min{ℓ1m,ℓm1−1}∑k1m=0(ℓm1−k1m)Gℓ1m,ℓm1,k1mCℓ1m+ℓm1−k1m1m⎞⎠× ×⎛⎜ ⎜ ⎜⎝N∏j=2j≠mmin{ℓ1j,ℓj1}∑k1j=0Gℓ1j,ℓj1,k1jCℓ1j+ℓj1−k1j1j⎞⎟ ⎟ ⎟⎠⎛⎝N∏i=2N∏j>imin{ℓij,ℓji}∑kij=0Gℓij,ℓji,kijCℓij+ℓji−kijij⎞⎠× ×E[∂a1−11N∏i=2∂aiig(X)],  m=2,…,N. (20)
###### Remark 3 (Upper limit of k1m-sum in each Bm).

In , the term for in the -sum is equal to zero. The only such term is for even and for . In order to exclude this term, the upper limit of -sum is for odd , and changed to for even . These two values are expressed in a unified way as . In every for , the term for in the -sum is equal to zero. This term is present in the sum for . In order to exclude this term, the upper limit is changed to .

By applying Stein’s lemma (1) at the average appearing in the right-hand side of Eq. (2), we obtain

 A=A0+N∑m=1Am, (21)

where

 A0=∑ri+∑Nj=1ℓij=nii=1,…,N[N∏i=1(niri,ℓi1,…,ℓiN)](μr1+11N∏i=2μrii)⎛⎝N∏i=1⌊ℓii/2⌋∑kii=0Hℓii,kiiσ2(ℓii−kii)i⎞⎠× ×⎛⎝N∏i=1N∏j>imin{ℓij,ℓji}∑kij=0Gℓij,ℓji,kijCℓij+ℓji−kijij⎞⎠E[N∏i=1∂aiig(X)], (22)
 A1=∑ri+∑Nj=1ℓij=nii=1,…,N[N∏i=1(niri,ℓi1,…,ℓiN)](N∏i=1μrii)⎛⎝⌊ℓ11/2⌋∑k11=0Hℓ11,k11σ2(ℓ11+1−k11)1⎞⎠× ×⎛⎝N∏i=2⌊ℓii/2⌋∑kii=0Hℓii,kiiσ2(ℓii−kii)i⎞⎠⎛⎝N∏i=1N∏j>imin{ℓij,ℓji}∑kij=0Gℓij,ℓji,kijCℓij+ℓji−kijij⎞⎠E[∂a1+11N∏i=2∂aiig(X)], (23)
 Am=∑ri+∑Nj=1ℓij=nii=1,…,N[N∏i=1(niri,ℓi1,…,ℓiN)](N∏i=1μrii)⎛⎝N∏i=1⌊ℓii/2⌋∑kii=0Hℓii,kiiσ2(ℓii−kii)i⎞⎠× ×⎛⎝min{ℓ1m,ℓm1}∑k1m=0Gℓ1m,ℓm1,k1mCℓ1m+1+ℓm1−k1m1m⎞⎠⎛⎜ ⎜ ⎜⎝N∏j=2j≠mmin{ℓ1j,ℓj1}∑k1j=0Gℓ1j,ℓj1,k1jCℓ1j+ℓj1−k1j1j⎞⎟ ⎟ ⎟⎠× ×⎛⎝N∏i=2N∏j>imin{ℓij,ℓji}∑kij=0Gℓij,ℓji,kijCℓij+ℓji−kijij⎞⎠E⎡⎢ ⎢⎣∂am+1mN∏i=1i≠m∂aiig(X)⎤⎥ ⎥⎦,  m=2,…,N. (24)

By performing the change of index in each , we obtain:

 B1=∑ri+∑Nj=1ℓij=nii=1,…,N[N∏i=1(niri,ℓi1,…,ℓiN)](N∏i=1μrii)× ×⎛⎝⌊(ℓ11+1)/2⌋∑k11=1(ℓ11−2k11+2)Hℓ11,k11−1σ2(ℓ11+1−k11)1⎞⎠⎛⎝N∏i=2⌊ℓii/2⌋∑kii=0Hℓii,kiiσ2(ℓii−kii)i⎞⎠× ×⎛⎝N∏i=1N∏j>imin{ℓij,ℓji}∑ki