## 1 Introduction and main results

Stein’s lemma, [23], [11], is a well-known identity of the multivariate normal distribution, with applications in statistics, see e.g. [25], [16]. Stein’s lemma is stated as follows.

###### Theorem 1 (Multivariate Stein’s lemma).

For an -dimensional Gaussian random vector , with mean value vector and autocovariance matrix , it holds true that

(1) |

where denotes the average operator, and is the partial derivative of with respect to component.

###### Proof.

In Appendix A, we prove Stein’s lemma using integration by parts. ∎

By Stein’s lemma (1), the presence of th component inside the average is eliminated, with being expressed in terms of the averages of , and the averages of its first-order partial derivatives with respect to each of the components of . The topic of the present work is to generalize Stein’s lemma, providing a formula of expressing the average in terms of the averages of partial derivatives of . In the recent work [12], we generalized the counterpart of Eq. (1) for the univariate normal distribution, expressing the average , where

is a scalar Gaussian random variable, in terms of the averages of the derivatives of scalar function

, up to the th order.Furthermore, the formula generalizing Stein’s lemma for has to be compatible with identities expressing higher order moments of a Gaussian random vector in terms of its first two moments (mean value vector and autocovariance matrix). The central result in this direction is Isserlis theorem [8], also known in physics literature as Wick’s theorem [26], on the moments of a zero-mean value Gaussian random vector. For a review of the related literature see also [24]. Recently, Song and Lee in [22] derived a formula in closed, compact form for the product Gaussian moments . Subsequently, in corollaries 4, 5, Isserlis and Song & Lee formulas for Gaussian moments are easily rederived from our generalizing formula (2), by setting .

###### Theorem 2 (Generalizing formula).

Let be an -dimensional Gaussian random vector with mean value vector and autocovariance matrix . The diagonal elements of matrix (the autocovariances of each component) are denoted as . For a smooth enough function , for a given with , , and under the assumption that all averages involved exist, it holds true that:

(2) |

where , is the floor function, and is the multinomial coefficient with factors. In Eq. (2), the orders of partial derivatives are defined as

(3) |

and being Kronecker’s delta; , for . Coefficients , are

(4) |

and

(5) |

where is the binomial coefficient. Sum is over all combinations of nonnegative integers with , .

###### Proof.

In Sec. 2, we prove Eq. (2) rigorously, by multidimensional mathematical induction on . In addition to this proof, and in order to provide the reader with more insight, we present, in Sec. 3, a constructive formal proof of theorem 2. This constructive proof is based on treating the mean value operator not as an integral, but as the sequential action of a number of pseudodifferential operators that are introduced in definition 1 via the moment-generating function of the Gaussian random vector. The action of these pseudodifferential operators is determined by their Taylor series expansions, under the formal assumption that all infinite series involved are summable. As we see in the proofs of lemmata 4, 5, coefficients and arise naturally in the constructive proof, when evaluating the terms appearing in the series expansions of the pseudodifferential operators. ∎

###### Remark 1 (On the orders of partial derivatives).

By Eq. (3), we observe that the order of the partial derivative with respect to in the right-hand side of Eq. (2) depends on: i) , that belongs to the partition of power of the same in the left-hand side of Eq. (2), and ii) all , , that each one of them belongs to the partition of power of one of the rest , . The second dependence of each is a consequence of the correlation between the components of the Gaussian random vector ; in corollary 2 for the uncorrelated case, we shall see that each depends on only.

###### Remark 2 (On the coefficients , ).

By the combinatorial interpretation of binomial coefficients, see e.g. [4, p. 110], coefficients , defined by Eq. (5), are identified as the number of ways to put together elements from a set of elements, with elements from another set of elements. In [10, p. 62], , defined by Eq. (4), are identified as the number of partitions of a set of elements into unordered pairs and singletons. Also, by virtue of [1, expression 22.3.11], are identified as the absolute values of the coefficients appearing in the th order probabilist’s Hermite polynomial; . Last, in the On-Line Encyclopedia of Integer Sequences [17], are referred to as the Bessel numbers.

We shall now review simplifications of formula (2) for some specific cases:

We show how to apply the generalizing formula by deriving the following example.

###### Example 1.

For , and , Eq. (1) results in

(7) |

###### Corollary 2 (Formula (2) for uncorrelated ).

Consider the case where the components of are uncorrelated: for . Thus, in Eq. (2), for , and so the first summation in its right-hand side is simplified into . Now, by defining , and expressing , the said summation is simplified into , which can be denoted in a contracted way by using multi-index notation, see e.g. [20, p. 319]. By introducing the -dimensional multi-indices , , , and denoting , Eq. (2) for the uncorrelated case reads

(8) |

In Eq. (8), the partial ordering of multi-indices implies that , for . Furthermore , , , , , and . For , Eq. (8) coincides with the extension of Stein’s lemma for the scalar case, that we derived recently in [12].

###### Corollary 3 (Rederivation of Stein’s lemma).

For the average , the first summation in the right-hand side of Eq. (2) is over all sets , where for all and for all subsets with one integer being equal to 1 and all the rest being equal to 0. Since the multinomial coefficient corresponding to each of these integer combinations equals to 1, , and by using the symmetry property of autocovariance matrix , Eq. (2) for reads

(9) |

We shall now derive Isserlis theorem [8] and Song & Lee formula [22] from formula (2), by setting in it, and identify the nonzero terms in its right-hand side. For , the nonzero terms in the right-hand side of Eq. (2), are only those that contain the zeroth order derivatives of ; all for .

###### Corollary 4 (Isserlis theorem).

The higher order moments of an -dimensional Gaussian random vector with zero mean value are expressed in terms of its autocovariance matrix as

(10) |

with being the set of all partitions of into unordered pairs.

###### Proof.

###### Corollary 5 (Product moment formula for Gaussian vectors).

As a generalization of Isserlis theorem 4, and for a Gaussian random vector whose mean value is non-zero in general, the following formula for its product moments holds true:

(11) |

where

(12) |

and with coefficients defined as

(13) |

In Eq. (11), summation is over the set of all with , for which , are nonnegative.

## 2 Proof of theorem 2 by mathematical induction

Theorem 2 for can be proven by multidimensional mathematical induction on the multi-index of exponents . As the base case, we choose . As we have seen in corollary 3, Eq. (2) for results in Stein’s lemma, Eq. (1), which is proven by theorem 1. Our inductive hypothesis is that Eq. (2) holds true for . Then, we have to prove Eq. (2) for . Thus, we have to prove Eq. (2) for augmented by 1, for every . In this section, we prove Eq. (2) for being augmented by 1. The proof for the rest is similar.

and by using the inductive hypothesis:

(14) |

By the general Leibniz rule [1, expression 3.3.8], we calculate the derivative inside the average in the right-hand side of Eq. (2) as

(15) |

Since , , and for , Eq. (15) is simplified into

(16) |

By using Eq. (2), we rewrite Eq. (2) as

(17) |

where

(18) |

(19) |

(20) |

###### Remark 3 (Upper limit of -sum in each ).

In , the term for in the -sum is equal to zero. The only such term is for even and for . In order to exclude this term, the upper limit of -sum is for odd , and changed to for even . These two values are expressed in a unified way as . In every for , the term for in the -sum is equal to zero. This term is present in the sum for . In order to exclude this term, the upper limit is changed to .