# A formulation for continuous mixtures of multivariate normal distributions

Several formulations have long existed in the literature in the form of continuous mixtures of normal variables where a mixing variable operates on the mean or on the variance or on both the mean and the variance of a multivariate normal variable, by changing the nature of these basic constituents from constants to random quantities. More recently, other mixture-type constructions have been introduced, where the core random component, on which the mixing operation operates, is not necessarily normal. The main aim of the present work is to show that many existing constructions can be encompassed by a formulation where normal variables are mixed using two univariate random variables. For this formulation, we derive various general properties. Within the proposed framework, it is also simpler to formulate new proposals of parametric families and we provide a few such instances. At the same time, the exposition provides a review of the theme of normal mixtures.

## Authors

• 4 publications
• 4 publications
05/14/2020

### On mean and/or variance mixtures of normal distributions

Parametric distributions are an important part of statistics. There is n...
12/29/2017

### Identifiability of two-component skew normal mixtures with one known component

We give sufficient identifiability conditions for estimating mixing prop...
11/27/2019

### The bivariate K-finite normal mixture "blanket" copula: an application to driving patterns

There are many bivariate parametric copulas in the literature to model b...
06/21/2018

### Maximal skewness projections for scale mixtures of skew-normal vectors

Multivariate scale mixtures of skew-normal (SMSN) variables are flexible...
11/08/2019

### Normal variance mixtures: Distribution, density and parameter estimation

Efficient computation of the distribution and log-density function of mu...
10/11/2018

### On formulations of skew factor models: skew errors versus skew factors

In the past few years, there have been a number of proposals for general...
08/25/2017

### BAMBI: An R package for Fitting Bivariate Angular Mixture Models

Statistical analyses of directional or angular data have applications in...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Continuous mixtures of normal distributions

In the last few decades, a number of formulations have been put forward, in the context of distribution theory, where a multivariate normal variable represents the basic constituent but with the superposition of another random component, either in the sense that the normal mean value or the variance matrix or both these components are subject to the effect of another random variable of continuous type. We shall refer to these constructions as ‘mixtures of normal variables’; the matching phrase ‘mixtures di normal distributions’ will also be used.

To better focus ideas, recall a few classical instances of the delineated scheme. Presumably, the best-known such formulation is represented by scale mixtures of normal variables, which can be expressed as

 Y=ξ+V1/2X (1)

where , is an independent random variable on , and

is a vector of constants. Scale mixtures

1 provide a stochastic representation of a wide subset of the class of elliptically contoured distributions, often called briefly elliptical distributions. For a standard account of elliptical distributions, see for instance Fang et al. (1990); specifically, their Section 2.6 examines the connection with scale mixtures of normal variables. A very importance instance occurs when , which leads to the multivariate Student’s distribution.

Another very important construction is the normal variance-mean mixture proposed by Barndorff-Nielsen (1977, 1978) and extensively developed by subsequent literature, namely

 Y=ξ+Vγ+V1/2X (2)

where is a vector of constants and is assumed to have a generalized inverse Gaussian (GIG) distribution. In this case turns out to have a generalized hyperbolic (GH) distribution, which will recur later in the paper.

Besides 1 and 2, there exists a multitude of other constructions which belong to the idea of normal mixtures delineated in the opening paragraph. Many of these formulations will be recalled in the subsequent pages, to illustrate the main target of the present contribution, which is to present a general formulation for normal mixtures. Our proposal involves an additional random component, denoted , and the effect of and is regulated by two functions, non-linear in general. As we shall see, this construction encompasses a large number of existing constructions in a unifying scheme, for which we develop various general properties.

The role of this activity is to highlight the relative connections of the individual constructions, with an improved understanding of their nature. As a side-effect, the presentation of the individual formulations plays also the role of a review of this stream of literature. Finally, the proposed formulation can facilitate the conception of additional proposals with specific aims. The emphasis is primarily on the multivariate context.

Since it moves a step towards generality, we mention beforehand the formulation of Tjetjep & Seneta (2006) where and in 2 are replaced by two linear functions of them, which allows to incorporate a number of existing families. Their construction is, however, entirely within the univariate domain. A number of multivariate constructions aiming at some level of generality do exist, and will examined in the course of the discussion.

In the next section, our proposed general scheme is introduced, followed by the derivation of a number of general properties. The subsequent sections show how to frame a large number of existing constructions within the proposed scheme. In the final section, we indicate some directions for even more general constructions.

## 2 Generalized mixtures of normal distributions

### 2.1 Notation and other formal preliminaries

As already effectively employed, the notation indicates that is a -dimensional normal random variable with mean vector and variance matrix . The density function and the distribution function of at are denoted by and . Hence, specifically, we have

 φd(x;μ,Σ)=1det(2πΣ)1/2exp{−12(x−μ)⊤Σ−1(x−μ)}

if . When , we drop the subscript . When and, in addition, and , we use the simplified notation and for the density function and the distribution function.

A quantity arising in connection with the multivariate normal distribution, but not only there, is the Mahalanobis distance, defined (in the non-singular case) as

 ∥x∥Σ=(x⊤Σ−1x)1/2 (3)

which is written in the simplified form when

is the identity matrix.

A function which will appear in various expressions is the inverse Mills ratio

 ζ(t)=φ(t)Φ(t),t∈R. (4)

A positive continuous random variable

has a GIG distribution if its density function can be written as

 g(v;λ,χ,ψ)=(√ψ/χ)λ2Kλ(√χψ)vλ−1exp(−12(χv−1+ψv)),v>0, (5)

where , , and denotes the modified Bessel function of the third kind. In this case, we write . The numerous properties of the GIG distribution and interconnections with other parametric families are reviewed by Jørgensen (1982). We recall two basic properties: both the distribution of and of for

are still of GIG type. A fact to be used later is that the Gamma distribution is obtained when

and .

A result in matrix theory which will be used repeatedly is the Sherman-Morrison formula for matrix inversion, which states

 (A+bd⊤)−1=A−1−11+d⊤A−1bA−1bd⊤A−1 (6)

provided that the square matrix and the vectors have conformable dimensions, and the inverse matrices exist.

### 2.2 Definition and basic facts

Consider a -dimensional random variable and univariate random variables and

with joint distribution function

, such that are mutually independent; hence can be factorized as . We assume to avoid technical complications and concentrate on the constructive process. These definitions and assumptions will be retained for the rest of the paper.

Given any real-valued function , a positive-valued function , and vectors and in , we shall refer to

 Y = ξ+r(U,V)γ+s(U,V)X (7) = ξ+Rγ+SX (8)

as a generalized mixture of normal (GMN) variables; we have written and with independence of from . Denote by the joint distribution function of implied by . The distribution of is identified by the notation .

For certain purposes, it is useful to think of as generated by the hierarchical construction

 (Y|U=u,V=v)∼Nd(ξ+r(u,v)γ,s(u,v)2Σ),(U,V)∼GU×GV. (9)

For instance, this representation is convenient for computing the mean vector and the variance matrix as

 E{Y} = E{E{Y|U,V}} (10) = E{ξ+r(U,V)γ} = ξ+E{R}γ

provided exists, and

 var{Y} = var{E{Y|U,V}}+E{%var{Y|U,V}} (11) = var{ξ+r(U,V)γ}+E{s(U,V)2Σ} = var{R}γγ⊤+E{S2}Σ

provided and exist. Another use of representation 9

is to facilitate the development of some EM-type algorithm for parameter estimation.

Similarly, by a conditioning argument, it is simple to see that the characteristic function of

is

 c(t)=exp(it⊤ξ)E{cN(t;r(U,V)γ,s(U,V)2Σ)},t∈R,

where denotes the characteristic function of a variable. Also, the distribution function of is

 F(y)=E{Φd(y;ξ+r(U,V)γ,s(U,V)2Σ)}. (12)

Consider the density function of , , in the case that is a non-null constant. From 9 it follows that

 f(y)=EG{φd(y;ξ+r(U,V)γ,s(U,V)2Σ)}=EH{φd(y;ξ+Rγ,S2Σ)}

where the first expected value is taken with respect to the distribution , the second one with respect to . Assume further that the distribution of is absolutely continuous with density function , and note that the transformation from to is invertible, so that a standard computation for densities of transformed variables yields, in an obvious notation,

 fR,S,Y(r,s,y) = s−dfR,S,X(r,s,s−1(y−ξ−rγ)) = h(r,s)s−dφd(s−1(y−ξ−rγ);0,Σ) = h(r,s)φd(y;ξ+rγ,s2Σ)

taking into account the independence of and . Hence we arrive at

 f(y)=∫R×R+φd(y;ξ+rγ,s2Σ)dH(r,s). (13)

An alternative route to obtain this expression would be via differentiation of the distribution function 12 with exchange of the integration and differentiation signs.

For statistical work, it is often useful to consider constructions of type 7 where the distributions of and belong to some parametric family. In these cases, care must be taken to avoid overparameterization. Given the enormous variety of specific instances embraced by 7, it seems difficult to establish general suitable condition, and we shall then discuss this issue within specific families or classes of distributions.

In the above passage, as well as in the rest of the paper, the term ‘family’ refers to the set of distributions obtained by a given specification of the variables when their parameters vary in some admissible space, while keeping the other ingredients fixed. Broader sets, generated for instance when the distributions of and vary across various parametric families, constitute ‘classes’.

A clarification is due about the use of the notation in 78 and some derived expressions to be presented later on. When we shall examine a certain family belonging to the general construction, that notation will translate into a certain parameterization, which often is not the most appropriate for inferential or for interpretative purposes, and its use here must not be intended as a recommendation for general usage. This scheme is adopted merely for uniformity and simplicity of treatment in the present investigation.

### 2.3 Affine transformations and other distributional properties

For the random variable introduced by 7-8, consider an affine transformation , for a -dimensional vector and a full-rank matrix of dimension , with ; denote these assumptions as ‘the b-B conditions’. It is immediate that

 W=b+B⊤Y=b+B⊤ξ+r(U,V)B⊤γ+s(U,V)B⊤X

is still of type 78 with the same mixing variables and modified numerical parameters. We have then reached the following conclusion.

###### Proposition 1

If and satisfy the b-B conditions introduced above, it follows that

 b+B⊤Y∼GMNq(b+B⊤ξ,B⊤ΣB,B⊤γ,H) (14)

is still a member of the GMN class, with the same mixing distribution of .

Partition now in two sub-vectors of sizes , such that , with corresponding partitions of the parameters in blocks of matching sizes, as follows

 Y=(Y1Y2),ξ=(ξ1ξ2),γ=(γ1γ2),Σ=(Σ11Σ12Σ21Σ22). (15)

To establish the marginal distributions of , we use Proposition 1 with and equal to a matrix formed by in the top rows and a block of s in the bottom rows. For , we proceed similarly, but setting the bottom rows of equal to . We then arrive at the following conclusion.

###### Proposition 2

If is partitioned as indicated in 15, then

 Y1∼GMNd1(ξ1,Σ11,γ1,H),Y2∼GMNd2(ξ2,Σ22,γ2,H). (16)

We now want examine conditions which ensure independence of and . From 9 it is clear that, if , and are conditionally independent given , with conditional distribution

 (Yj|U=u,V=v)∼Nd(ξj+r(u,v)γj,s(u,v)2Σjj),j=1,2, (17)

where Moreover, if (constant) and one of the marginal distributions is symmetric, i.e., or , then and are independent. The notation

and similar ones later on must be intended ‘with probability 1’; we shall not replicate this specification subsequently.

A more detailed argument is as follows, where we take for mere simplicity of notation. without affecting the generality of the argument. From 9, we have that the conditional joint characteristic function of , given and (or, equivalently, given and ), is

so that the joint characteristic function of is

 c(t1,t2) = E{eit⊤1Y1+it⊤2Y2} (18) = E{E{eit⊤1Y1+it⊤2Y2|U,V}} = E{eir(U,V)(t⊤1γ1+t⊤2γ2)−12s(U,V)2(t⊤1Σ11t1+2t⊤1Σ12t2+t⊤2Σ22t2)} = E{eir(U,V)t⊤1γ1−12s(U,V)2t⊤1Σ11t1eir(U,V)t⊤2γ2−12s(U,V)2t⊤2Σ22t2e−s(U,V)2t⊤1Σ12t2}.

In analogous way, by 17 the marginal characteristic functions are

 cj(tj) = E{eit⊤jYj} (19) = E{E{eit⊤jYj|U,V}} =

Note that, if and , then by 19 reduces to the centred normal characteristic function for . We have then reached the following conclusion.

###### Proposition 3

Given partition 15, the components are independent provided , and at least one of and is , with the following implications:

• if , the joint characteristic function 18 reduces to ,

• if , the joint characteristic function 18 reduces to .

If both and are , the distribution reduces to the case of independent normal variables.

In essence, under the conditions of Proposition 3, one of and has a plain normal distribution and the other one falls under the construction discussed later in Section 3.

Outside the conditions of Proposition 3, the structure of 18 does not appear to be suitable for factorization as the product of two legitimate characteristic functions, and we conjecture that, in general, independence between and cannot be achieved.

Examine now the conditional distributions associated to partition 15. Factorize the joint density of as where is the conditional density of and is the marginal density of . For simplicity of treatment, suppose that is absolutely continuous, with density . Then, by 13 and the properties of the multivariate normal density, write

 f1|2(y1|y2)f2(y2)=∫R×R+φd1(y1;ξ1|2+γ1|2r,s2Σ1|2)φd2(y2;ξ2+γ2r,s2Σ22)h(r,s)drds

where

 ξ1|2=ξ1+Σ12Σ−122(y2−ξ2),γ1|2=γ1−Σ12Σ−122γ2,Σ11|2=Σ11−Σ12Σ−122Σ21

having assumed that the conditioning operation and integration can be exchanged. Hence, for the conditional density of given we have

 f1|2(y1|y2)=1f2(y2)∫R×R+φd1(y1;ξ1|2+γ1|2r,s2Σ1|2)φd2(y2;ξ2+γ2r,s2Σ22)h(r,s)drds.

Now, from the Bayes’s rule, we obtain that the conditional density of given is

 hc(r,s|y2)=φd2(y2;ξ2+γ2r,s2Σ22)h(r,s)f2(y2). (20)

Using this fact in the last integral, we can re-write

 f1|2(y1|y2)=∫R×R+φd1(y1;ξ1|2+γ1|2r,s2Σ11|2)hc(r,s|y2)drds (21)

which exhibits the same structure of 13. Therefore we can conclude that

 (Y1|Y2=y2)∼GMNd1(ξ1|2,Σ11|2,γ1|2,Hc(y2))

where denotes the distribution function associated to the conditional density 20.

For many GMN constructions 78, the density function of is likely to be known in explicit form; in these cases, the same holds true for , recalling 16. Then, a convenient aspect of expression 20 is that it indicates how to compute the conditional density once the joint unconditional distribution is available explicitly. Clearly, this is especially amenable in those constructions where is really a univariate variable, as in Sections 3 and 4 below.

In one of the appendices, we illustrate the use of 2021 in the case of a multivariate distribution.

For use in the next result, but also in the rest of the paper, define the quantities

 Ω=Σ+γγ⊤,η=(1+γ⊤Σ−1γ)−1/2Σ−1γ,α2=∥γ∥2Σ=γ⊤Σ−1γ,δ2=α21+α2 (22)

such that and . For notational convenience, we introduce the notation

 μhk=E{RhSk},k=0,1,… (23)

when the named expectation exists.

###### Proposition 4

For a random variable having distribution of type 78 with , the following facts hold:

 S−2(Y0−Rγ)⊤Σ−1(Y0−Rγ) ∼ χ2d, (24) E{Y⊤0Σ−1Y0} = dE{S2}+α2E{R2}=dμ02+α2μ20, (25) E{Y⊤0Ω−1Y0} = dE{S2}+δ2(E{R2}−E{S2})=dμ02+δ2(μ20−μ02), (26)

provided and exist, using the quantities defined in 22 and 23.

Proof: From 8, write , where is independent of ; this yields result 24. For equality 25, expand the initial identity of this proof as

 Y⊤0Σ−1Y0−2Rγ⊤Σ−1Y0+R2γ⊤Σ−1γ=S2X⊤Σ−1X

and take expectation on both sides of this equality. We obtain

 E{RY0}=E{RE{Y0|U,V}}=E{RE{Rγ+SX)|U,V}}=E{R(Rγ+SE{X|U,V})}=E{R2}γ,

bearing in mind that , by the independence assumption between and . This leads to 25.

For 26, write , and . Using 10 and 11, we obtain , so that

 E{Q}=E{R2}γ⊤Ω−1γ+E{S2}\rm tr(Ω−1Σ).

By using the Sherman-Morrison equality 6, we conclude the proof. qed

In the subsequent pages, the matrix defined in 22 and the associated quadratic form will appear repeatedly. A connected relevant question is: under which conditions is 26 free of ? Equivalently, under which conditions

 E{Q}=E{Y⊤0Ω−1Y0}=dE{S2}? (27)

This equality represents a form of invariance which is known to hold in some cases to be recalled later on, but we want to examine it more generally. One setting where equality 27 holds is given by , , where , and . It is then immediate to see that , so that the final term of 26 is zero.

The conditions and are in turn achieved when and . In this case , where which is independent of . Hence, , where

 E{Q0}=\rm tr(Ω−1E{ZZ⊤})=E{U2}γ⊤Ω−1γ+\rm tr% (Ω−1Σ)

since and and so . Thus, if , then by using 6 it clearly follows that . We shall return to this issue later on.

### 2.5 Mardia’s measures of multivariate asymmetry and kurtosis

For a multivariate random variable such that and , Mardia (1970, 1974)

has introduced measures of multivariate skewness and kurtosis, defined as

 β1,d=E{[(Z−μZ)⊤Σ−1Z(Z′−μZ)]3},β2,d=E{[(Z−μZ)⊤Σ−1Z(Z−μZ)]2}, (28)

where is an independent copy of , provided these expected values exist. These measures represent extensions of corresponding familiar quantities for the univariate case:

 β1=E{(Z−μZ)3}2var{Z}3=γ21,β2=E{(Z−μZ)2}% var{Z}2=γ2+3, (29)

in the sense that and .

We want to find expressions for 28 in the case of a random variable of type 78. Recall the expressions for and given in 10 and 11, and the notation defined in 23

for the moments of

, and write

 R0=R−μ10,Y−μY=R0γ+SX.

assuming that the involved mean values exist. Taking into account the invariance of and with respect to non-singular affine transformations, it is convenient to work with the transformed quantities

 X0=Σ−1/2X∼Nd(0,Id),γ0=Σ−1/2γ,Y0=Σ−1/2(Y−μY)=R0γ0+SX0

where any form of the square root matrix can be adopted.

The subsequent development involves extensive algebra of which we report here only the summary elements; detailed computations are provided in an appendix. Recall introduced in 22 and define

 ¯μ20=μ20−μ210=var{R}ρ=¯μ20μ02,¯ρ=ρα21+ρα2=α2¯μ20μ02+α2¯μ20.

Introduce the auxiliary random variables , which is independent of , and . We need to compute the following expectations:

 E{S2Z0} = α(μ12−μ10μ02), E{S2Z20} = α2(μ22−2μ12μ10+μ210μ02)+μ04, E{Z30} = α3(μ30−3μ20μ10+2μ310)+3α(μ12−μ10μ02), E{Z40} = α4(μ40−4μ30μ10+6μ20μ210−3μ410)+6α2(μ22−2μ12μ10+μ210μ02)+3μ04,

assuming the existence of moments of up to the fourth order. With these ingredients, the Mardia’s measures for the GMN construction can be expressed as

 β1,d = (30) β2,d = μ−202((d+1)(d−1)μ04+2(d−1)(1−¯ρ)E{S2Z20}+(1−¯ρ)2E{Z40}). (31)

Considering the complexity that typically involves the explicit specification of 28 outside the normal family, the above expressions appear practically manageable. They are further simplified when one specializes them to a given family or to a certain subclass of the GMN construction. For a given choice of the distribution , we need to work out the following ingredients: (i) the marginal moments of , , up to order 4, (ii) the marginal moments and of , (ii) the cross moments and . The working is illustrated next for the GH family; additional illustrations will appear later.

#### Mardia’s measures for the GH family

For the GH family with representation 2, there is a single mixing variable with density 5 and , . General expressions for are given in Section 2.1 of Jørgensen (1982), among others. These expressions also provide and . The two other required quantities are and which are still ordinary moments of . We can now compute

 E{S2Z0} = α(E{V2}−(E{V})2)=ασ2V,% \leavevmode\nobreak\ say, E{S2Z20} = α2(E{V3}−2E{V2}E{V}+(E{V})2)+E{V2}, E{Z30} = α3(E{V3}−3E{V2}E{V}+2(E{V})3)+3αvar{V} = (ασV)3β1(V)+3ασ2V, E{Z40} = α4(E{V4}−4E{V3}E{V}+6E{V2}(E{V})2−3(E{V})4) +6α2(E{V3}−2E{V2}E{V}+(E{V})3)+3E{V2} =

where and , are the univariate measures of skewness and kurtosis in 29 evaluated for . Plugging the above quantities in 30 and 31 completes the computation.

#### Remark

There exists an interesting way of re-writing 30 and 30 which will turn out useful later on. Since with zero mean and

 var{Z0}=α2¯μ20+μ02=(1−¯ρ)−1μ02

we can introduced an univariate standardized GMN-type variable

 ~Z0=αR+ST0−αμ10√α2¯μ20+μ02∼GMN1(−μ10√α2¯μ20+μ02,1α2¯μ20+μ02,α√α