1 Continuous mixtures of normal distributions
In the last few decades, a number of formulations have been put forward, in the context of distribution theory, where a multivariate normal variable represents the basic constituent but with the superposition of another random component, either in the sense that the normal mean value or the variance matrix or both these components are subject to the effect of another random variable of continuous type. We shall refer to these constructions as ‘mixtures of normal variables’; the matching phrase ‘mixtures di normal distributions’ will also be used.
To better focus ideas, recall a few classical instances of the delineated scheme. Presumably, the best-known such formulation is represented by scale mixtures of normal variables, which can be expressed as
(1) |
where , is an independent random variable on , and
is a vector of constants. Scale mixtures
1 provide a stochastic representation of a wide subset of the class of elliptically contoured distributions, often called briefly elliptical distributions. For a standard account of elliptical distributions, see for instance Fang et al. (1990); specifically, their Section 2.6 examines the connection with scale mixtures of normal variables. A very importance instance occurs when , which leads to the multivariate Student’s distribution.Another very important construction is the normal variance-mean mixture proposed by Barndorff-Nielsen (1977, 1978) and extensively developed by subsequent literature, namely
(2) |
where is a vector of constants and is assumed to have a generalized inverse Gaussian (GIG) distribution. In this case turns out to have a generalized hyperbolic (GH) distribution, which will recur later in the paper.
Besides 1 and 2, there exists a multitude of other constructions which belong to the idea of normal mixtures delineated in the opening paragraph. Many of these formulations will be recalled in the subsequent pages, to illustrate the main target of the present contribution, which is to present a general formulation for normal mixtures. Our proposal involves an additional random component, denoted , and the effect of and is regulated by two functions, non-linear in general. As we shall see, this construction encompasses a large number of existing constructions in a unifying scheme, for which we develop various general properties.
The role of this activity is to highlight the relative connections of the individual constructions, with an improved understanding of their nature. As a side-effect, the presentation of the individual formulations plays also the role of a review of this stream of literature. Finally, the proposed formulation can facilitate the conception of additional proposals with specific aims. The emphasis is primarily on the multivariate context.
Since it moves a step towards generality, we mention beforehand the formulation of Tjetjep & Seneta (2006) where and in 2 are replaced by two linear functions of them, which allows to incorporate a number of existing families. Their construction is, however, entirely within the univariate domain. A number of multivariate constructions aiming at some level of generality do exist, and will examined in the course of the discussion.
In the next section, our proposed general scheme is introduced, followed by the derivation of a number of general properties. The subsequent sections show how to frame a large number of existing constructions within the proposed scheme. In the final section, we indicate some directions for even more general constructions.
2 Generalized mixtures of normal distributions
2.1 Notation and other formal preliminaries
As already effectively employed, the notation indicates that is a -dimensional normal random variable with mean vector and variance matrix . The density function and the distribution function of at are denoted by and . Hence, specifically, we have
if . When , we drop the subscript . When and, in addition, and , we use the simplified notation and for the density function and the distribution function.
A quantity arising in connection with the multivariate normal distribution, but not only there, is the Mahalanobis distance, defined (in the non-singular case) as
(3) |
which is written in the simplified form when
is the identity matrix.
A function which will appear in various expressions is the inverse Mills ratio
(4) |
A positive continuous random variable
has a GIG distribution if its density function can be written as(5) |
where , , and denotes the modified Bessel function of the third kind. In this case, we write . The numerous properties of the GIG distribution and interconnections with other parametric families are reviewed by Jørgensen (1982). We recall two basic properties: both the distribution of and of for
are still of GIG type. A fact to be used later is that the Gamma distribution is obtained when
and .A result in matrix theory which will be used repeatedly is the Sherman-Morrison formula for matrix inversion, which states
(6) |
provided that the square matrix and the vectors have conformable dimensions, and the inverse matrices exist.
2.2 Definition and basic facts
Consider a -dimensional random variable and univariate random variables and
with joint distribution function
, such that are mutually independent; hence can be factorized as . We assume to avoid technical complications and concentrate on the constructive process. These definitions and assumptions will be retained for the rest of the paper.Given any real-valued function , a positive-valued function , and vectors and in , we shall refer to
(7) | |||||
(8) |
as a generalized mixture of normal (GMN) variables; we have written and with independence of from . Denote by the joint distribution function of implied by . The distribution of is identified by the notation .
For certain purposes, it is useful to think of as generated by the hierarchical construction
(9) |
For instance, this representation is convenient for computing the mean vector and the variance matrix as
(10) | |||||
provided exists, and
(11) | |||||
provided and exist. Another use of representation 9
is to facilitate the development of some EM-type algorithm for parameter estimation.
Similarly, by a conditioning argument, it is simple to see that the characteristic function of
iswhere denotes the characteristic function of a variable. Also, the distribution function of is
(12) |
Consider the density function of , , in the case that is a non-null constant. From 9 it follows that
where the first expected value is taken with respect to the distribution , the second one with respect to . Assume further that the distribution of is absolutely continuous with density function , and note that the transformation from to is invertible, so that a standard computation for densities of transformed variables yields, in an obvious notation,
taking into account the independence of and . Hence we arrive at
(13) |
An alternative route to obtain this expression would be via differentiation of the distribution function 12 with exchange of the integration and differentiation signs.
For statistical work, it is often useful to consider constructions of type 7 where the distributions of and belong to some parametric family. In these cases, care must be taken to avoid overparameterization. Given the enormous variety of specific instances embraced by 7, it seems difficult to establish general suitable condition, and we shall then discuss this issue within specific families or classes of distributions.
In the above passage, as well as in the rest of the paper, the term ‘family’ refers to the set of distributions obtained by a given specification of the variables when their parameters vary in some admissible space, while keeping the other ingredients fixed. Broader sets, generated for instance when the distributions of and vary across various parametric families, constitute ‘classes’.
A clarification is due about the use of the notation in 7–8 and some derived expressions to be presented later on. When we shall examine a certain family belonging to the general construction, that notation will translate into a certain parameterization, which often is not the most appropriate for inferential or for interpretative purposes, and its use here must not be intended as a recommendation for general usage. This scheme is adopted merely for uniformity and simplicity of treatment in the present investigation.
2.3 Affine transformations and other distributional properties
For the random variable introduced by 7-8, consider an affine transformation , for a -dimensional vector and a full-rank matrix of dimension , with ; denote these assumptions as ‘the b-B conditions’. It is immediate that
is still of type 7–8 with the same mixing variables and modified numerical parameters. We have then reached the following conclusion.
Proposition 1
If and satisfy the b-B conditions introduced above, it follows that
(14) |
is still a member of the GMN class, with the same mixing distribution of .
Partition now in two sub-vectors of sizes , such that , with corresponding partitions of the parameters in blocks of matching sizes, as follows
(15) |
To establish the marginal distributions of , we use Proposition 1 with and equal to a matrix formed by in the top rows and a block of s in the bottom rows. For , we proceed similarly, but setting the bottom rows of equal to . We then arrive at the following conclusion.
Proposition 2
If is partitioned as indicated in 15, then
(16) |
We now want examine conditions which ensure independence of and . From 9 it is clear that, if , and are conditionally independent given , with conditional distribution
(17) |
where Moreover, if (constant) and one of the marginal distributions is symmetric, i.e., or , then and are independent. The notation
and similar ones later on must be intended ‘with probability 1’; we shall not replicate this specification subsequently.
A more detailed argument is as follows, where we take for mere simplicity of notation. without affecting the generality of the argument. From 9, we have that the conditional joint characteristic function of , given and (or, equivalently, given and ), is
so that the joint characteristic function of is
(18) | |||||
In analogous way, by 17 the marginal characteristic functions are
(19) | |||||
Note that, if and , then by 19 reduces to the centred normal characteristic function for . We have then reached the following conclusion.
Proposition 3
Given partition 15, the components are independent provided , and at least one of and is , with the following implications:
-
if , the joint characteristic function 18 reduces to ,
-
if , the joint characteristic function 18 reduces to .
If both and are , the distribution reduces to the case of independent normal variables.
In essence, under the conditions of Proposition 3, one of and has a plain normal distribution and the other one falls under the construction discussed later in Section 3.
Outside the conditions of Proposition 3, the structure of 18 does not appear to be suitable for factorization as the product of two legitimate characteristic functions, and we conjecture that, in general, independence between and cannot be achieved.
Examine now the conditional distributions associated to partition 15. Factorize the joint density of as where is the conditional density of and is the marginal density of . For simplicity of treatment, suppose that is absolutely continuous, with density . Then, by 13 and the properties of the multivariate normal density, write
where
having assumed that the conditioning operation and integration can be exchanged. Hence, for the conditional density of given we have
Now, from the Bayes’s rule, we obtain that the conditional density of given is
(20) |
Using this fact in the last integral, we can re-write
(21) |
which exhibits the same structure of 13. Therefore we can conclude that
where denotes the distribution function associated to the conditional density 20.
For many GMN constructions 7–8, the density function of is likely to be known in explicit form; in these cases, the same holds true for , recalling 16. Then, a convenient aspect of expression 20 is that it indicates how to compute the conditional density once the joint unconditional distribution is available explicitly. Clearly, this is especially amenable in those constructions where is really a univariate variable, as in Sections 3 and 4 below.
2.4 On quadratic forms
For use in the next result, but also in the rest of the paper, define the quantities
(22) |
such that and . For notational convenience, we introduce the notation
(23) |
when the named expectation exists.
Proposition 4
Proof: From 8, write , where is independent of ; this yields result 24. For equality 25, expand the initial identity of this proof as
and take expectation on both sides of this equality. We obtain
bearing in mind that , by the independence assumption between and . This leads to 25.
For 26, write , and . Using 10 and 11, we obtain , so that
By using the Sherman-Morrison equality 6, we conclude the proof. qed
In the subsequent pages, the matrix defined in 22 and the associated quadratic form will appear repeatedly. A connected relevant question is: under which conditions is 26 free of ? Equivalently, under which conditions
(27) |
This equality represents a form of invariance which is known to hold in some cases to be recalled later on, but we want to examine it more generally. One setting where equality 27 holds is given by , , where , and . It is then immediate to see that , so that the final term of 26 is zero.
The conditions and are in turn achieved when and . In this case , where which is independent of . Hence, , where
since and and so . Thus, if , then by using 6 it clearly follows that . We shall return to this issue later on.
2.5 Mardia’s measures of multivariate asymmetry and kurtosis
For a multivariate random variable such that and , Mardia (1970, 1974)
has introduced measures of multivariate skewness and kurtosis, defined as
(28) |
where is an independent copy of , provided these expected values exist. These measures represent extensions of corresponding familiar quantities for the univariate case:
(29) |
in the sense that and .
We want to find expressions for 28 in the case of a random variable of type 7–8. Recall the expressions for and given in 10 and 11, and the notation defined in 23
for the moments of
, and writeassuming that the involved mean values exist. Taking into account the invariance of and with respect to non-singular affine transformations, it is convenient to work with the transformed quantities
where any form of the square root matrix can be adopted.
The subsequent development involves extensive algebra of which we report here only the summary elements; detailed computations are provided in an appendix. Recall introduced in 22 and define
Introduce the auxiliary random variables , which is independent of , and . We need to compute the following expectations:
assuming the existence of moments of up to the fourth order. With these ingredients, the Mardia’s measures for the GMN construction can be expressed as
(30) | |||||
(31) |
Considering the complexity that typically involves the explicit specification of 28 outside the normal family, the above expressions appear practically manageable. They are further simplified when one specializes them to a given family or to a certain subclass of the GMN construction. For a given choice of the distribution , we need to work out the following ingredients: (i) the marginal moments of , , up to order 4, (ii) the marginal moments and of , (ii) the cross moments and . The working is illustrated next for the GH family; additional illustrations will appear later.
Mardia’s measures for the GH family
For the GH family with representation 2, there is a single mixing variable with density 5 and , . General expressions for are given in Section 2.1 of Jørgensen (1982), among others. These expressions also provide and . The two other required quantities are and which are still ordinary moments of . We can now compute