    # A new test of multivariate normality by a double estimation in a characterizing PDE

This paper deals with testing for nondegenerate normality of a d-variate random vector X based on a random sample X_1,...,X_n of X. The rationale of the test is that the characteristic function ψ(t) = (-t^2/2) of the standard normal distribution in R^d is the only solution of the partial differential equation Δ f(t) = (t^2-d)f(t), t ∈R^d, subject to the condition f(0) = 1. By contrast with a recent approach that bases a test for multivariate normality on the difference Δψ_n(t)-(t^2-d)ψ(t), where ψ_n(t) is the empirical characteristic function of suitably scaled residuals of X_1,...,X_n, we consider a weighted L^2-statistic that employs Δψ_n(t)-(t^2-d)ψ_n(t). We derive asymptotic properties of the test under the null hypothesis and alternatives. The test is affine invariant and consistent against general alternatives, and it exhibits high power when compared with prominent competitors.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

A useful tool for assessing the fit of data to a family of distributions are empirical counterparts of distributional characterizations. Such characterizations often emerge as solutions of an equation of the type . Here,

may be the moment generating function, the Laplace transform, or the characteristic function, and D denotes a differential operator, i.e., this operator can be regarded as ordinary differentiation if

is a function of only one variable or, for instance, the Laplace operator in the multivariate case. Such (partial) differential equations have been used to test for multivariate normality, see [DEH:2019, HV:2019], exponentiality, see [BH:1991]

, the gamma distribution, see

[HME:2012]

, the inverse Gaussian distribution, see

[HK:2002]

, the beta distribution, see

[RM:2018]

, the univariate and multivariate skew-normal distribution, see

[M:2010] and [MH:2010], and the Rayleigh distribution, see [MI:2003]. In all these references, the authors propose a goodness-of-fit test by plugging in an empirical counterpart for into , and by measuring the deviation from the zero function in a suitable function space. If, under the hypothesis to be tested, the function has a closed form and is known, there are two options for obtaining an empirical counterpart to the characterizing equation, namely , or . To the best of our knowledge, the effect of considering both options for the same testing problem

and to study the consequences on the performance of the resulting test statistics has not yet been considered, neither from a theoretical point of view, nor in a simulation study. In this spirit, the purpose of this paper is to investigate the effect on the power of a recent test for multivariate normality based on a characterization of the multivariate normal law in connection with the harmonic oscillator, see

[DEH:2019].

In what follows, let be a fixed integer, and let be independent and identically distributed (i.i.d.)

-dimensional random (column) vectors, that are defined on a common probability space

. We write for the distribution of , and we denote the -variate normal law with expectation and nonsingular covariance matrix by N. Moreover, stands for the class of all nondegenerate -variate normal distributions. To check the assumption of multivariate normality means to test the hypothesis

 H0:PX∈Nd, (1)

against general alternatives. The starting point of this paper is Theorem 1.1 of [DEH:2019]. To state this result, let denote the Laplace operator, the Euclidean norm in , and I

the identity matrix of size

. Then Theorem 1.1 of [DEH:2019] states that the characteristic function , of the -variate standard normal distribution N is the unique solution of the partial differential equation

 {Δf(x)−(∥x∥2−d)f(x)=0,x∈Rd,f(0)=1. (2)

Writing for the sample mean and for the sample covariance matrix of , respectively, where the superscript means transposition, the standing tacit assumptions that is absolutely continuous with respect to Lebesgue measure and guarantee that is invertible almost surely, see [EP:1973]. The test statistic is based on the so-called scaled residuals

 Yn,j=S−1/2n(Xj−¯¯¯¯¯Xn),j=1,…,n.

Here, is the unique symmetric positive definite square root of . Letting , , denote the empirical characteristic function (ecf) of , the test statistic proposed in [DEH:2019] is

 Tn,a=n∫Rd|Δψn(t)−Δψ(t)|2wa(t)dt, (3)

where

 wa(t)=exp(−a∥t∥2),t∈Rd, (4)

and is a fixed constant. The statistic has a nice closed-form expression as a function of , (see display (10)-(12) of [DEH:2019]) and is thus invariant with respect to full-rank affine transformations of . Theorems 2.2 and 2.3 of [DEH:2019] show that, elementwise on the underlying probability space, suitably rescaled versions of have limits as and , respectively. In the former case, the limit is a measure of multivariate skewness, introduced in [MRS:1993]

, whereas Mardia’s time-honored measure of multivariate kurtosis (see

[MAR:1970]) shows up as . As , the statistic has a nondegenerate limit null distribution (Theorem 4.1 of [DEH:2019]), and a test of (1) that rejects for large values of is able to detect alternatives that approach at the rate , irrespective of the dimension (Corollary 5.2 of [DEH:2019]). Under an alternative distribution satisfying , converges almost surely to a measure of distance between and the class (Theorem 6.1 of [DEH:2019]). As a consequence, the test for multinormality based on is consistent against any such alternative. By Theorem 6.5 of [DEH:2019], the sequence

converges in distribution to a centered normal law. Since the variance of this limit distribution can be estimated consistently from

(Theorem 6.7 of [DEH:2019]

), we have an asymptotic confidence interval for

.

The novel approach taken in this paper is to replace both of the functions occurring in (2) by the ecf . Since, under , and should be close to each other for large , it is tempting to see what happens if, instead of defined in (3), we base a test of on the weighted -statistic

 Un,a=n∫Rd∣∣Δψn(t)−(∥t∥2−d)ψn(t)∣∣2wa(t)dt (5)

and reject for large values of .

Since , the relation

 ∫Rd(∥t∥2−d)2cos(t⊤c)exp(−a∥t∥2)dt = (πa)d/216d2a3(a−1)+4d(d+2)a2+(8da2−4(d+2)a)∥c∥2+∥c∥416a4exp(−∥c∥24a),

valid for and , and tedious but straightforward calculations yield the representation

which is amenable to computational purposes. Moreover, turns out to be affine invariant.

The rest of the paper is organized as follows. In Section 2, we derive the elementwise limits of , after suitable transformations, as and . Section 3 deals with the limit null distribution of as . In Section 4, we show that, under the condition , has an almost sure limit as under a fixed alternative to normality. As a consequence, the test based on is consistent against any such alternative. Moreover, we prove that the asymptotic distribution of , after a suitable transformation, is a centered normal distribution. In Section 5, we present the results of a simulation study that compares the power of the test for normality based on with that of prominent competitors. Section 6 shows a real data example, and Section 7 contains some conclusions and gives an outlook on potential further work.

## 2 The limits a→0 and a→∞

This section considers the (elementwise) limits of as and . The results shed some light on the role of the parameter that figures in the weight function in (4). Notice that, from the definition of given in (5), we have and , since . Suitable transformations of , however, yield well-known limit statistics as and .

###### Theorem 2.1.

Elementwise on the underlying probability space, we have

 lima→0[(aπ)d/2Un,a−d(d+2)4a2]=1nn∑j=1∥Yn,j∥4−d2. (8)
###### Proof.

Starting with (7), is, apart from the factor , a double sum over and . Since each summand for which vanishes asymptotically as , we have

 (aπ)d/2Un,a=1nn∑j=1[∥Yn,j∥4−d(2a−1)a∥Yn,j∥2+d2(a−1)a+d(d+2)4a2]+o(1)

as , and the result follows from the fact that . ∎

Theorem 2.1 means that a suitable affine transformation of has a limit as , and that this limit is – apart from the additive constant – the time-honored measure of multivariate kurtosis in the sense of Mardia, see [MAR:1970]. The same measure – without the subtrahend – shows up as a limit of as , see Theorem 2.3 of [DEH:2019]. The next result shows that and , after multiplication with the same scaling factor, converge to the same limit as , cf. Theorem 2.1 of [DEH:2019].

###### Theorem 2.2.

Elementwise on the underlying probability space, we have

###### Proof.

The proof follows the lines of the proof of Theorem 2.1 of [DEH:2019] and is thus omitted. ∎

The limit figuring on the right hand side of (9) is a measure of multivariate skewness, introduced by Móri, Rohatgi and Székely, see [MRS:1993]. Theorem 2.1 and Theorem 2.2 show that the class of tests for are in a certain sense "closed at the boundaries" and . However, in contrast to the test for multivariate normality based on for fixed , tests for based on measures of multivariate skewness and kurtosis lack consistency against general alternatives, see, e.g., [BHE:1991, BH:1992, HEN:1994a].

## 3 The limit null distribution of Un,a

In this section, we assume that the distribution of is some nondegenerate -variate normal law. In view of affine invariance, we may further assume that and . By symmetry, it is readily seen that defined in (5) takes the form

 Un,a=∫RdS2n(t)wa(t)dt, (10)

where

 Sn(t)=1√nn∑j=1(∥Yn,j∥2+∥t∥2−d)(cos(t⊤Yn,j)+sin(t⊤Yn,j)),t∈Rd. (11)

In view of (10), our setting for asymptotics will be the separable Hilbert space of (equivalence classes of) measurable functions that satisfy . Here and in the sequel, each unspecified integral will be over . The scalar product and the norm in are given by and , respectively. Notice that, in this notation, (10) takes the form , where is given in (11).

Putting as before, and writing for convergence in distribution, the main result of this section is as follows.

###### Theorem 3.1.

If has some nondegenerate normal distribution, we have the following:

1. There is a centered Gaussian random element of having covariance kernel

 K(s,t) = ψ(s−t){2d+∥s∥2∥t∥2−2s⊤t∥s−t∥2−4∥s−t∥2} +2ψ(s)ψ(t){2∥s∥2+2∥t∥2−d−2s⊤t−4(s⊤t)2},s,t∈Rd,

such that, with defined in (11), as .

2. We have

###### Proof.

Since the proof is analogous to the proof of Proposition 3.2 of [DEH:2019], it will only be sketched. If stands for the modification of that results if we replace with

, then a Hilbert space central limit theorem holds for

, since the summands of are square-integrable centered random elements of . The idea is thus to find a random element of such that and . Putting in (11) and using the fact that , , where depend on and and satisfy , , some algebra and Proposition A.1 of [DEH:2019] show that a choice of is given by

 ˜Sn(t)=1√nn∑j=1h(Xj,t),

where

 h(x,t) = (∥x∥2+∥t∥2−d)(cos(t⊤x)+sin(t⊤x)) −ψ(t){2∥t∥2+∥x∥2−d+2t⊤x−2(t⊤x)2}.

Tedious calculations then show that the covariance kernel of , which is , is equal to given above. ∎

Let

denote a random variable having the limit distribution of

given in (12). Since the distribution of is that of , where is the Gaussian random element of figuring in Theorem 3.1, it is the distribution of , where is a sequence of i.i.d. standard normal random variables, and

are the positive eigenvalues corresponding to normalized eigenfunctions of the integral operator

on , where . It seems to be hopeless to obtain closed-form expressions of these eigenvalues. However, in view of Fubini’s theorem, we have

 E[U∞,a]=∫E[S2(t)]wa(t)dt=∫K(t,t)wa(t)dt,

and thus straightforward manipulations of integrals yield the following result.

###### Theorem 3.2.

Putting , we have

 E[U∞,a]=2d(πa)d/2{1−γ+γa+1−(d+2)γ(a+1)2+d+28a2}.

From this result, one readily obtains

 lima→0[(aπ)d/2E[U∞,a]−d(d+2)4a2]=2d. (13)

It is interesting to compare this limit relation with (8). If the underlying distribution is standard normal, i.e., if , we have . Now, writing and using Proposition A.1 of [DEH:2019], the right hand side of (8) turns out to converge in probability to as , and this expectation is the right hand side of (13). Regarding the case , the representation of easily yields

This result corresponds to (9), since, by Theorem 2.2 of [H:1997], the right hand side of (9), after multiplication with , converges in distribution to as if . Here,

is a random variable having a chi square distribution with

degrees of freedom.

## 4 Limits of Un,a under alternatives

In this section we assume that are i.i.d., and that . Moreover, let and in view of affine invariance, and recall the Laplace operator from Section 1. The characteristic function of will be denoted by , . Letting

 ψ±(t)=E[cos(t⊤X)]±E[sin(t⊤X)],t∈Rd,

we first present an almost sure limit for .

###### Theorem 4.1.

We have

 Un,ana.s.⟶Γa:=∫Rdz2(t)wa(t)dt=∥z∥2H,

where

 z(t)=−Δψ+(t)+(∥t∥2−d)ψ+(t). (14)
###### Proof.

In what follows, we write , and we put , for the sake of brevity. From (10) and (11), we have , where

 Vn(t)=1nn∑j=1∥Yj∥2CS+(t⊤Yj),Wn(t)=(∥t∥2−d)1nn∑j=1CS+(t⊤Yj).

Putting

 V0n(t)=1nn∑j=1∥Xj∥2CS+(t⊤Xj),W0n(t)=(∥t∥2−d)1nn∑j=1CS+(t⊤Xj),

the strong law of large numbers in Hilbert spaces (see, e.g., Theorem 7.7.2 of

[HEU:2015]) yields as , and thus it suffices to prove . From

the Cauchy–Schwarz inequality, the fact that , , and Minkowski’s inequality, it suffices to prove and as . As for , the inequalities , and the Cauchy–Schwarz inequality yield . In view of Proposition A.1 b) of [DEH:2019], we have . Regarding , we decompose this difference according to

 Vn(t)−V0n(t)=1nn∑j=1(∥Yj∥2−∥Xj∥2)CS+(t⊤Yj)+1nn∑j=1∥Xj∥2(CS+(t⊤Yj)−CS+(t⊤Xj)).

The squared norm in of the second summand on the right hand side converges to zero almost surely, see the treatment of in the proof of Theorem 6.1 of [DEH:2019]. The same holds for the first summand, since its modulus is bounded from above by , and the inequality , together with Proposition A.1 b) of [DEH:2019], yield the assertion. ∎

Since, under the conditions of Theorem 4.1, is strictly positive if the underlying distribution does not belong to , converges almost surely to under such an alternative, and we have the following result.

###### Corollary 4.2.

The test which reject the hypothesis for large values of is consistent against each fixed alternative satisfying .

The next result, which corresponds to Theorem 6.4 of [DEH:2019], shows that the (population) measure of multivariate skewness in the sense of Móri, Rohatgi and Székely emerges as the limit of , after a suitable scaling, as .

###### Theorem 4.3.

Under the condition , we have

 lima→∞2a(aπ)d/2Γa=∥∥E(∥X∥2X)∥∥2.
###### Proof.

By definition,

 Γa = ∫(∥t∥2−d)2ψ+(t)2wa(t)dt−2∫(∥t∥2−d)ψ+(t)Δψ+(t)wa(t)dt+∫(Δψ+(t))2wa(t)dt = Γa,1+Γa,2+Γa,3 (say).

In what follows, let be independent copies of . Since , the addition theorems for the cosine and the sine function and symmetry yield

 Γa,1=E[∫(∥t∥2−d)2cos(t⊤(Y−Z))wa(t)dt].

Putting , display (1) then gives

 Γa,1 =(πa)d/2116a4E[(16d2a3(a−1)+4d(d+2)a2+∥Y−Z∥4

Likewise, it follows that , whence

 Γa,2 = 2E[∥Y∥2∫(∥t∥2−d)cos(t⊤(Y−Z))wa(t)dt] = −2(πa)d/2E[∥Y∥2(∥Y−Z∥24a2+d−d2a)exp(−∥Y−Z∥24a)].

Finally,

 Γa,3=(πa)d/2E[∥Y∥2∥Z∥2exp(−∥Y−Z∥24a)],

and it follows that

 2a(aπ)d/2Γa =2aE[∥Y∥2∥Z∥2exp(−∥Y−Z∥24a)] −4aE[∥Y∥2(∥Y−Z∥24a2+d−d2a)exp(−∥Y−Z∥24a)] +18a3E[(16d2a3(a−1)+4d(d+2)a2+∥Y−Z∥4 +(8da2−4(d+2)a)∥Y−Z∥2)exp(−∥Y−Z∥24a)].

Now, dominated convergence yields

as . Since and , we have

 E[∥Y∥2∥Z∥2∥Y−Z∥2]=2dE∥Y∥4−2E∥∥∥X∥2X∥∥2,E[∥Y∥2∥Y−Z∥2]=E∥Y∥4+d2,

and the assertion follows. ∎

We close this section with a result on the asymptotic normality of under fixed alternatives. That such a result holds in principle follows from Theorem 1 of [BEH:2017]. To state the main idea, write again and notice that, by (10), , where is given in (11). Putting

 S∗n(t)=Sn(t)√n=1nn∑j=1(∥Yn,j∥2+∥t∥2−d)CS+(t⊤Yn,j),t∈Rd,

Theorem 4.1 and (14) show that

 √n(Un,an−Γa) = √n(∥S∗n∥2H−∥z∥2)=√n⟨S∗n−z,S∗n+z⟩H (15) = √n⟨S∗n−z,2z+S∗n−z⟩H = 2⟨V∗n,z⟩H+1√n∥V∗n∥2H,

where , . In the sequel, let denote the gradient of a differentiable function , evaluated at , and write for the Hessian matrix of at if is twice continuously differentiable. By proceeding as in the proof of Theorem 3.3 of [DEH:2019], there is a centered Gaussian element of having covariance kernel

 K∗(s,t)=E[h∗(X,s)h∗(X,t)],s,t∈Rd,

where

 h∗(x,t) = (∥x∥2+∥t∥2−d)CS+(x,t)+(12∇Δψ+(t)⊤−12(∥t∥2−d)∇ψ+(t)⊤)(xx⊤−Id)t +2∇ψ−(t)⊤x+(Δψ−(t)−(∥t∥2−d)ψ−(t))t