 # Gaussian approximation of Gaussian scale mixture

For a given positive random variable V>0 and a given Z∼ N(0,1) independent of V, we compute the scalar t_0 such that the distance between Z√(V) and Z√(t_0), in the L^2() sense, is minimal. We also consider the same problem in several dimensions. Keywords: Normal approximation, Gaussian scale mixture, Plancherel theorem.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Let is standard Gaussian in and consider an independent random positive definite matrix of order with distribution . We call the distribution of a Gaussian scaled mixture. Denote by the density in of For several can yield the same density .

In many practical circumstances, is not very well known, and is complicated. On the other hand, for , histograms of the symmetric density

 f(x)=∫∞0e−x22vμ(dv)√2πv (1)

look like the histogram of a normal distribution since

is convex. The aim of the present note is to say something of the best normal approximation of in the sense of

In Section 2, we recall some known facts and examples about the pair when In Section 3, our main result, for is Proposition 3.1 which shows the existence, the uniqueness of and the fact that . This proposition also gives the equation, see (6), that has to be solved to obtain when is known. In Section 4, we consider the more difficult case when . In that case, is a positive definite matrix, and Proposition 4.2 shows the existence of . A basic tool we use in this note is the Plancherel identity.

## 2 Review of Gaussian scaled mixtures in the uni-dimensional case

A probability density

on is called a discrete Gaussian scale mixture if there exist numbers and such that and

 f(x)=n∑i=1pi1√2πvie−x22vi.

It easy to see that if is independent of then the density of is A way to see this is to observe that for all we have

 ∫∞−∞esxf(x)dx=n∑i=1pies22vi=E(E(esZ√V|V))=E(esZ√V).

More generally, we will say that the density is a Gaussian scale mixture

if there exists a probability distribution

on such that (1) holds. As in the finite mixture case, if is independent of the density of is To see this denote

 LV(u)=∫∞0e−uvμ(dv). (2)

Then

 ∫∞−∞esxf(x)dx=LV(−s2/2)=E(esZ√V). (3)

For instance if and if

 f(x)=a2e−a|x| (4)

is the double exponential density, then for we have

 ∫∞−∞esxf(x)dx=a2a2−s2=LV(−s2/2)

where

 LV(u)=a2a2+2u=a22∫∞0e−vu−a22vdv.

This means that the mixing measure

is an exponential distribution with mean

There are other examples of pairs in the literature. For instance, Palmer, Kreutz-Delgado and Makeig (2011) offer an interesting catalog containing also some examples for Note that if is known then the distribution of is known and finding the distribution or the distribution of is a problem of deconvolution. If its solution exists, it is unique, as shown for instance by (3).

An example of such a deconvolution is given by West (1987), who extends (4) to where as follows: he observes that for and , there exists a probability density , called a positive stable law, such that, for

 ∫∞0e−tθg(t)dt=Ce−Aθα.

If we define where is such that is a probability and replace by we get, for ,

 ∫∞0e−12x2v1√2πvμ(dv)=Ce−a|x|2α.

For , the Laplace transform is not elementary anymore.

Another elegant example of deconvolution is given by Stefanski (1990) and Monahan and Stefanski (1992) with the logistic distribution

 f(x)=ex(1+ex)2=∞∑n=1(−1)n+1ne−n|x|. (5)

Using (4) to represent , one can deduce that, if exists here, it must be

 μ(dv)=12∞∑n=1(−1)n+1n2e−n22vdv

which indeed exists since this is the Kolmogorov-Smirnov distribution.

## 3 Normal approximation

Such a mixing keeps some characteristics of the normal distribution: It is a symmetric density, where is convex since

 e−κ(u)=∫∞0e−u/vμ(dv)√2πv=∫∞0e−uwν(dw)

is the Laplace transform of the positive measure defined as the image of by the map

As said in the introduction, in some practical applications, the distribution of is not very well known, and it is interesting to replace by the density of an ordinary normal distribution The distance is well adapted to this problem. We are going to prove the following result.

Proposition 3.1. If is defined by (1), then

1. if and only if

 E(1√V+V1)<∞

when and are independent with the same distribution

2. If , there exists a unique which minimizes

 t↦IV(t)=∫∞−∞[f(x)−1√2πte−x22t]2dx.
3. The scalar the unique positive solution of the equation

 ∫∞0μ(dv)(1+vy)3/2=123/2. (6)

In particular, if is the distribution of , then

4. The value of is

 IV(t0)=√2π(E(1√V+V1)−2E(1√V+t0)+1√2t0)

In particular

 IλV(t′)=1√λIV(t0). (7)
5. Finally

Proof. Recall that if and if , then Plancherel theorem says that

 12π∫∞−∞|^g(s)|2ds=∫∞−∞|g(x)|2dx. (8)

Furthermore if then if and only if

Let us apply (8) first to From (1), we have . Then

 ∫∞−∞^f2(s)ds = ∫∞−∞L2V(s2/2)ds=√2∫∞0L(u)2du√u = √2∫∞0E(e−u(V+V1))du√u=√2πE(1√V+V1).

This proves statement 1. of the proposition.

To prove 2., 3. and 4., we apply (8) to for which As a consequence

 IV(t)=12π∫∞−∞[LV(s2/2)−e−ts2/2]2ds=1π∫∞0[LV(u)−e−tu]2du√2u

and

 I′V(t)=−√2π∫∞0[LV(u)−e−tu]e−tu√udu.

Since and since

 ∫∞0LV(u)e−tu√udu=∫∞0∫∞0e−u(v+t)√uduμ(dv)=Γ(3/2)∫∞0μ(dv)(t+v)3/2,

then if and only if

 ∫∞0μ(dv)(t+v)3/2=1(2t)3/2.

We can rewrite this equation in as where

 F(y)=∫∞0μ(dv)(1+vy)3/2.

Since and

 F′(y)=−32∫∞0vμ(dv)(1+vy)5/2<0,

it follows that has only one zero on and it is easy to see from the sign of that reaches its minimum at

To show 5, we will apply Jensen inequality to the convex function and the random variable . From

 1(1+y0E(V))3/2≤E(1(1+y0V)3/2)=123/2

it follows that and

Example 1. Suppose that Let us compute and With the help of Mathematica, we see that the solution of

 12(1+t)3/2+12(2+t)3/2=1(2t)3/2

is . Finally

 IV(t0)=√2π(14√2+12√3+18−1√1+t0−1√2+t0+1√2t0)=0.00019,

a satisfying result.

Example 2. Suppose that is uniform on Then

 t0=0.36678, IV(t0)=0.0182.

If is uniform on , then from Part 4 of Proposition 3.1, we have

Example 3. Suppose that Then

 t0=0.524, IV(t0)=0.0207.

## 4 Extension to the Euclidean space

Denote by the convex cone of real positive definite matrices of order . A scaled Gaussian mixture on is the density of a random variable on of the form where

is a random matrix in

independent of the standard random Gaussian variable In this section, we study the conditions that the distribution must satisfy for to be in , and we find a Gaussian law which is the closest to in the sense.

### 4.1 Non identifiability

An important remark is in order: for the measure which generates a given is not unique.

Example 4. Let and consider the Wishart distribution with shape parameter and expectation Then since

 ∫Pe−trace(vt)μp(dv)=1(det(In+t))p

we can claim that

 ∫Pe−12s∗vsμp(dv)=1(det(In+12ss∗))p=1(1+12∥s∥2)p.

Similarly, consider

following a gamma distribution with shape parameter

and mean Consider also the distribution of Then

 E(e−λX)=1(1+12λ)p,   E(e−trace(s∗Vs))=1(1+12∥s∥2)p.

This example shows that and generate the same scaled Gaussian mixture distribution.

### 4.2 Some integrals for the standard Gaussian distribution

We recall here two simple formulas. We use the convention that if it is written aa a column matrix and

is its transposed matrix and is a row vector.

Lemma 4.1. Let Then

 ∫Rne−12s∗Asds=(2π)n/2√detA,   ∫Rne−12s∗Asss∗ds=(2π)n/2√detAA−1.

Proof. Without loss of generality, we may assume that is diagonal, and the proof is obvious in this particular case.

### 4.3 Existence of the best normal approximation

Proposition 4.2. Let be a probability distribution on the convex cone of positive definite matrices of order . Let deote the density of the random variable of where is independent of Then

1. if and only if where and are independent with the same distribution

2. For consider the function defined on by

 t↦I(t)=∫Rn[f(x)−1√(2π)ndette−12x∗t−1x]2dx.

Then reaches its minimum at some , and this is a solution in of the following equation in

 ∫P(v+t)−1√det(v+t)μ(dv)=121+12nt−1√dett. (9)

Proof. We have

 ^f(s)=∫Rnei⟨s,x⟩f(x)dx=E(ei⟨V1/2Z,s⟩)=E(e−12s∗Vs). (10)

Now using Plancherel Theorem and Lemma 4.1, we prove part 1. as follows:

 ∫Rnf2(x)dx=1(2π)n∫Rn^f(s)2ds=1(2π)n∫RnE(e−12s∗(V+V1)s)ds=1(2π)n/2E(1det√V+V1).

To prove part 2, we use Plancherel theorem again and obtain

 I(t)=1(2π)n∫Rn[^f(s)−e−12s∗ts]2ds.

We then want to show that the minimum of is reached at some Let

 I1(t)=(2π)n/2I(t)−∥^f∥2.

Then from Lemma 4.1,

 I1(t)=−2⟨^f,gt⟩+∥gt∥2=−2⟨^f,gt⟩+2n/2πn√dett

where We show that

 K1={y∈P;I1(y−1)≤0}

is compact. Writing

 I2(y)=⟨^f,gy−1⟩1(2π)n/2√dety,

we see that , i.e. if and only if From (10), the definition of and Lemma 4.1, we have that

 I2(y) = 1(2π)n/2√dety∫RnE(e−s∗Vs2)e−s∗y−1s2ds=1(2π)n/2√detyE(∫Rnes∗(V+y−1)s2ds) = 1√detyE(1√det(V+y−1))=∫Pμ(dv)√det(In+vy).

For let us show that

 K2={y∈P;I2(y)≥C}

is compact. Note that for Since is continuous, is closed. Let us prove that is bounded. Denote Suppose that is such that and let us show that for such a which is a contradiction.

Indeed, if . To see this, assume that . Then

 trace(vy(k))=v1y(k)11+⋯+vny(k)nn≥trace(y(k))×minivi≥∥y(k)∥×minivi→k→∞∞.

Moreover, if

are the eigenvalues of

,

 det(In+vy(k))=(1+λ1)…(1+λn)≥1+λ1+⋯+λn=1+trace(vy(k))→k→∞∞

By dominated convergence, it follows that and this proves that is bounded. We have therefore shown that is compact. This proves that the minimum of and is reached at some point of

The last task is to show that is a solution of equation Since is differentiable and reaches its minimum on the open set , the differential of must cancel at .

Denote by the linear space of symmetric real matrices of dimension equipped with the scalar product The differential of is the following linear form on

 h↦I′(t)(h)=1(2π)n/2∫Rn[^f(s)−e−12s∗ts]e−12s∗tss∗hsds.

The equality is equivalent to

Using the second formula in Lemma 4.1 and the fact that , we obtain

 ∫P(v+t)−1√det(v+t)μ(dv)=(2t)−1√det(2t)=121+12nt−1√dett.

Comment. While it is highly probable that the value at which reaches its minimum is unique, it is difficult to show for that the complicated equation (9) has a unique solution: there is no reason to think that the function is convex. This is not the case for

## 5 References

Monahan, J. F. and Stefanski, L. A. (1992). Normal Scale Mixture Approximations to and Computation of the Logistic-Normal Integral, in Handbook of the Logistic Distribution, N. Balakrishnan, Ed., Marcel Dekker, New York.

Palmer, J.A, Kreutz-Delgado, K. and Maleig, S. (2011) Dependency models based on generalized Gaussian scale mixtures. DRAFT UCSD-SCCN v1.0, Sept 7.

Stefanski, L. A. (1991). A Normal Scale Mixture Representation of the Logistic Distribution, Statistics & Probability Letters 11, 69–70.

West, M. (1987) ’On scale mixture of normal distributions’ Biometrika 74, 3, 646-8.