Let is standard Gaussian in and consider an independent random positive definite matrix of order with distribution . We call the distribution of a Gaussian scaled mixture. Denote by the density in of For several can yield the same density .
In many practical circumstances, is not very well known, and is complicated. On the other hand, for , histograms of the symmetric density
look like the histogram of a normal distribution sinceis convex. The aim of the present note is to say something of the best normal approximation of in the sense of
In Section 2, we recall some known facts and examples about the pair when In Section 3, our main result, for is Proposition 3.1 which shows the existence, the uniqueness of and the fact that . This proposition also gives the equation, see (6), that has to be solved to obtain when is known. In Section 4, we consider the more difficult case when . In that case, is a positive definite matrix, and Proposition 4.2 shows the existence of . A basic tool we use in this note is the Plancherel identity.
2 Review of Gaussian scaled mixtures in the uni-dimensional case
A probability densityon is called a discrete Gaussian scale mixture if there exist numbers and such that and
It easy to see that if is independent of then the density of is A way to see this is to observe that for all we have
More generally, we will say that the density is a Gaussian scale mixture
if there exists a probability distributionon such that (1) holds. As in the finite mixture case, if is independent of the density of is To see this denote
For instance if and if
is the double exponential density, then for we have
This means that the mixing measure
is an exponential distribution with mean
There are other examples of pairs in the literature. For instance, Palmer, Kreutz-Delgado and Makeig (2011) offer an interesting catalog containing also some examples for Note that if is known then the distribution of is known and finding the distribution or the distribution of is a problem of deconvolution. If its solution exists, it is unique, as shown for instance by (3).
An example of such a deconvolution is given by West (1987), who extends (4) to where as follows: he observes that for and , there exists a probability density , called a positive stable law, such that, for
If we define where is such that is a probability and replace by we get, for ,
For , the Laplace transform is not elementary anymore.
Another elegant example of deconvolution is given by Stefanski (1990) and Monahan and Stefanski (1992) with the logistic distribution
Using (4) to represent , one can deduce that, if exists here, it must be
which indeed exists since this is the Kolmogorov-Smirnov distribution.
3 Normal approximation
Such a mixing keeps some characteristics of the normal distribution: It is a symmetric density, where is convex since
is the Laplace transform of the positive measure defined as the image of by the map
As said in the introduction, in some practical applications, the distribution of is not very well known, and it is interesting to replace by the density of an ordinary normal distribution The distance is well adapted to this problem. We are going to prove the following result.
Proposition 3.1. If is defined by (1), then
if and only if
when and are independent with the same distribution
If , there exists a unique which minimizes
The scalar the unique positive solution of the equation
In particular, if is the distribution of , then
The value of is
Proof. Recall that if and if , then Plancherel theorem says that
Furthermore if then if and only if
This proves statement 1. of the proposition.
To prove 2., 3. and 4., we apply (8) to for which As a consequence
Since and since
then if and only if
We can rewrite this equation in as where
it follows that has only one zero on and it is easy to see from the sign of that reaches its minimum at
To show 5, we will apply Jensen inequality to the convex function and the random variable . From
it follows that and
Example 1. Suppose that Let us compute and With the help of Mathematica, we see that the solution of
is . Finally
a satisfying result.
Example 2. Suppose that is uniform on Then
If is uniform on , then from Part 4 of Proposition 3.1, we have
Example 3. Suppose that Then
4 Extension to the Euclidean space
Denote by the convex cone of real positive definite matrices of order . A scaled Gaussian mixture on is the density of a random variable on of the form where
is a random matrix inindependent of the standard random Gaussian variable In this section, we study the conditions that the distribution must satisfy for to be in , and we find a Gaussian law which is the closest to in the sense.
4.1 Non identifiability
An important remark is in order: for the measure which generates a given is not unique.
Example 4. Let and consider the Wishart distribution with shape parameter and expectation Then since
we can claim that
following a gamma distribution with shape parameterand mean Consider also the distribution of Then
This example shows that and generate the same scaled Gaussian mixture distribution.
4.2 Some integrals for the standard Gaussian distribution
We recall here two simple formulas. We use the convention that if it is written aa a column matrix and
is its transposed matrix and is a row vector.
Lemma 4.1. Let Then
Proof. Without loss of generality, we may assume that is diagonal, and the proof is obvious in this particular case.
4.3 Existence of the best normal approximation
Proposition 4.2. Let be a probability distribution on the convex cone of positive definite matrices of order . Let deote the density of the random variable of where is independent of Then
if and only if where and are independent with the same distribution
For consider the function defined on by
Then reaches its minimum at some , and this is a solution in of the following equation in
Proof. We have
Now using Plancherel Theorem and Lemma 4.1, we prove part 1. as follows:
To prove part 2, we use Plancherel theorem again and obtain
We then want to show that the minimum of is reached at some Let
Then from Lemma 4.1,
where We show that
is compact. Writing
we see that , i.e. if and only if From (10), the definition of and Lemma 4.1, we have that
For let us show that
is compact. Note that for Since is continuous, is closed. Let us prove that is bounded. Denote Suppose that is such that and let us show that for such a which is a contradiction.
Indeed, if . To see this, assume that . Then
are the eigenvalues of,
By dominated convergence, it follows that and this proves that is bounded. We have therefore shown that is compact. This proves that the minimum of and is reached at some point of
The last task is to show that is a solution of equation Since is differentiable and reaches its minimum on the open set , the differential of must cancel at .
Denote by the linear space of symmetric real matrices of dimension equipped with the scalar product The differential of is the following linear form on
The equality is equivalent to
Using the second formula in Lemma 4.1 and the fact that , we obtain
Comment. While it is highly probable that the value at which reaches its minimum is unique, it is difficult to show for that the complicated equation (9) has a unique solution: there is no reason to think that the function is convex. This is not the case for
Monahan, J. F. and Stefanski, L. A. (1992). Normal Scale Mixture Approximations to and Computation of the Logistic-Normal Integral, in Handbook of the Logistic Distribution, N. Balakrishnan, Ed., Marcel Dekker, New York.
Palmer, J.A, Kreutz-Delgado, K. and Maleig, S. (2011) Dependency models based on generalized Gaussian scale mixtures. DRAFT UCSD-SCCN v1.0, Sept 7.
Stefanski, L. A. (1991). A Normal Scale Mixture Representation of the Logistic Distribution, Statistics & Probability Letters 11, 69–70.
West, M. (1987) ’On scale mixture of normal distributions’ Biometrika 74, 3, 646-8.