Multivariate Log-Skewed Distributions with normal kernel and their Applications

We introduce two classes of multivariate log skewed distributions with normal kernel: the log canonical fundamental skew-normal (log-CFUSN) and the log unified skew-normal (log-SUN). We also discuss some properties of the log-CFUSN family of distributions. These new classes of log-skewed distributions include the log-normal and multivariate log-skew normal families as particular cases. We discuss some issues related to Bayesian inference in the log-CFUSN family of distributions, mainly we focus on how to model the prior uncertainty about the skewing parameter. Based on the stochastic representation of the log-CFUSN family, we propose a data augmentation strategy for sampling from the posterior distributions. This proposed family is used to analyze the US national monthly precipitation data. We conclude that a high dimensional skewing function lead to a better model fit.

READ FULL TEXT VIEW PDF

Authors

page 1

page 2

page 3

page 4

11/12/2020

Some properties of the unified skew-normal distribution

For the family of multivariate probability distributions variously denot...
06/15/2018

Generalized Log-Normal Chain-Ladder

We propose an asymptotic theory for distribution forecasting from the lo...
06/28/2019

Fast and Exact Simulation of Multivariate Normal and Wishart Random Variables with Box Constraints

Models which include domain constraints occur in myriad contexts such as...
02/11/2019

On the Distribution of Traffic Volumes in the Internet and its Implications

Getting good statistical models of traffic on network links is a well-kn...
12/06/2021

UniLog: Deploy One Model and Specialize it for All Log Analysis Tasks

UniLog: Deploy One Model and Specialize it for All Log Analysis Tasks...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The construction of new parametric distributions has received considerable attention in recent years. This growing interest is motivated by datasets that often present strong skewness, heavy tails, bimodality and some other characteristics that are not well fitted by the usual distributions, such as the normal, Student-, log-normal, exponential and many others. The main goal is to build more flexible parametric distributions with additional parameters allowing to control such characteristics. If compared to finite mixtures of distributions (see Lin et al., 2007b; Cabral et al., 2008, for instance) or nonparametric methods (for recent surveys on Bayesian nonparametric see Müller and Quintana, 2004; Walker, 2005; Dey et al., 1998), one advantage of this approach is that, in general, more parsimonious models are obtained and, as a consequence, the inference process tends to become simpler.

It is not feasible to mention all developments in this area in recent years. Arnold and Beaver (2002), Genton (2004) and Azzalini (2005) review several recent works in the area and are important sources of a detailed discussion of such distributions properties. Further advances in the area can be found in Genton and Loperfido (2005), Arellano-Valle and Azzalini (2006), Arellano-Valle et al. (2006), Arnold et al. (2009), Elal-Olivero et al. (2009), Arellano-Valle et al. (2010), Marchenko and Genton (2010), Goméz et al. (2011), Bolfarine et al. (2011), Rocha et al. (2013) and many others.

The seminal paper by Azzalini (1985) is one of the main references in this topic and has inspired many other works. Azzalini (1985)

introduced the so called skew-normal (SN) family of distributions which probability density function (pdf) is

(1)

where and are the location and scale parameters, respectively, is the skewness parameter and and

denote, respectively, the pdf and the cumulative distribution function (cdf) of the

. The family in (1) extends the normal one by introducing an extra parameter to control the asymmetry of the distribution and has the normal family as a particular subclass whenever equals zero. It also preserves some nice properties of the normal family. Another extension of the univariate distribution in (1) recently appeared in Martinez-Flores et al. (2014) which introduced the so called skew-normal alpha-power distibution. The multivariate analog of the SN distribution was introduced by Azzalini and Dalla Valle (1996).

In a more general setting, Genton and Loperfido (2005) introduced the class of generalized multivariate skew elliptical (GSE) distributions which pdf is

(2)

where is the pdf of a -dimensional elliptical distribution and is a skewing function satisfying , for all . Many of the SN distribution properties also follow to any distribution in this class. Particularly, Genton and Loperfido (2005) prove that distributions of quadratic forms in the GSE family do not depend on the skewing function

. Some other properties of the GSE family, such as the joint moment generating functions of linear transformations and quadratic forms of

and the conditions for their independence, can be found in Huang et al. (2013). It should be also mentioned that the multivariate SN families of distributions defined by Azzalini and Dalla Valle (1996) and Azzalini and Capitanio (1999) and the family of skew-spherical (elliptical) distributions defined in Branco and Dey (2001) are subclasses of (2).

Azzalini and Dalla Valle (1996)’s family of distributions is also a subclass of the fundamental SN (FUSN) class of distributions defined by Arellano-Valle and Genton (2005)

. A vector

has a -variate canonical fundamental skew-normal (CFUSN) distribution with an skewness matrix , which will be denoted by , if its density is given by

(3)

where is such that , for all unitary vectors , and denotes euclidean norm. Along this paper, we denote by the p.d.f. associated with the multivariate distribution, and by the corresponding cumulative distribution function (c.d.f.). If (respectively and ) these functions will be denoted by and (respectively and ). For simplicity, and will be used in the univariate case.

Several classes of SN distributions were defined in the literature. An unification of these families is proposed by Arellano-Valle and Azzalini (2006) which define the unified skew-normal family of distribution, the so-called SUN family. A random vector if its pdf is

(4)

where the vectors and , is the vector of the diagonal elements of ,

is a diagonal matrix formed by the standard deviations of

, , and are, respectively, , and matrices such that

is a correlation matrix. For another unification of multivariate skewed distributions see Abtahi and Towhidi (2013).

In limit cases, some of these distributions concentrate their probability mass in positive (or negative) values. The half-normal distribution, for instance, is obtaind from (

1) by assuming

equal to infinite. Because of this, such family of distributions has also been considered to model data with positive support, such as income, precipitation, pollutants concentration and so on. However, such limit distributions are not flexible enough to accommodate the diversity of shapes of positive (or negative) data. In the univariate context, Gamma, exponential and log-normal distributions are commonly used to model non-negative random variables. Less conventional analysis can be done using the log-SN and log-Skew-

introduced by Azzalini et al. (2003) or the log-power-normal distribution introduced by Martinez-Flores et al. (2012).

In the multivariate context, however, distributions with positive support are usually intractable, with the exception of the multivariate log-normal distribution. With the above problem in mind, Marchenko and Genton (2010) built the multivariate log-skew elliptical family of distributions as follows. Denote by the family of -dimensional elliptical distributions (with existing pdf) with generating function , defining a -dimensional spherical density, a location column vector , and a x positive definite dispersion matrix . If , then its pdf is , where , (Fang et al., 1990). Consider the class of skew elliptical distributions with pdf given by

(5)

where is a shape parameter, is a x scale matrix, is the pdf of a -dimensional random vector of and is the cdf of the with generating function . The distribution in (5) is denoted by . Consider the transformation , where . Then, has log-skew elliptical distribution denoted by with pdf

(6)

It is immediate that the multivariate skew-normal (Azzalini and Dalla Valle, 1996) and skew-t (Azzalini and Capitanio, 2003) distributions are special cases of (5). Consequently, the log-skewed class of distributions in (6) introduced by Marchenko and Genton (2010) also defines particular classes of multivariate log-SN and log-skew- distributions and has, as a special case, the multivariate log-normal family of distributions.

Our main motivation to introduce new classes of multivariate log-skewed distribution are some results that recently appeared in a paper by Santos et al. (2013)

. That paper focused on the parameter interpretation in the mixed logistic regression models which is done through the so called odds ratio as in the usual logistic regression model. However, by considering the random effects, the odds ratio to compare two individuals in two different clusters becomes a random variable (

) that depends on the random effects related to the two clusters under comparison (Larsen et al., 2000). Because of this, Larsen et al. (2000) propose to interpret the odds ratio in terms of the median of its distribution in order to quantify appropriately the heterogeneity among the different clusters. If the random effects are independent and identically distributed (iid) with then Santos et al. (2013) prove that the odds ratio has distribution with pdf given by

(7)

where , and . Similar distributions were also obtained under independent skew-normally distributed random effects. The univariate log-skewed distribution in (7) does not belong to the class of distributions defined by Marchenko and Genton (2010), nor to that introduced by Azzalini et al. (2003). Moreover, only its median was obtained by Santos et al. (2013) but no other property of it was studied.

In this paper, we introduce the multivariate log-CFUSN and log-SUN family of distributions. We explore their relationship and study some properties of the log-CFUSN family of distributions. Such classes of distributions have as subclasses the multivariate log-skew-normal family introduced by Marchenko and Genton (2010), the log-SN family by Azzalini et al. (2003) and the family of distributions given in (7). We also discuss some issues related to Bayesian inference in this family. To illustrate its use we analyze the USA monthly precipitation data recorded from 1895 to 2007, that is available at the National Climatic Data Center (NCDC).

This paper is organized as follows. In Section 2 we define the log-CFUSN and the log-SUN families of distributions and establish some of the probabilistic properties of the log-CFUSN family of distributions. Bayesian inference in the log-CFUSN family is discussed in Section 3. In Section 4 we present some data analysis using the proposed log-CFUSN family of distributions. Finally, Section 5 finishes the paper with a discussion and our main conclusions.

2 Log-SUN and Log-CFUSN families of distributions

Under the normal theory, the log-normal family of distributions is obtained assuming the logarithimic transformation. If a random variable is log-normally distributed it follows that the log transformation of it, that is, , has a normal distribution. Following this idea, in this section, we formally define the log-canonical-fundamental-skew-normal (log-CFUSN) and the log-unified-skew-normal (log-SUN) families of distributions and explore some properties of the log-CFUSN such as conditional and marginal distributions, mixed moments and stochastic representations.

Let be an random vector and consider the transformations and .

Definition 1.

(Log-CFUSN family of distributions) Let and be random vectors such that . We say that has a log-canonical-fundamental-skew-normal distribution with skewness matrix denoted by , if with pdf given in (3).

Thus, from definition 1, we have that and using some results of probability calculus, we can prove that the pdf of the log-CFUSN family of distributions with skewness matrix is

(8)

where is an matrix such that , for all unity vectors a .

This distribution generalizes the multivariate log-SN distribution defined by Marchenko and Genton (2010) by assuming a -variate skewing function. If in (8) we take and assume we obtain the family defined by Marchenko and Genton (2010) which general expression is given in . If is a matrix with all entries equal to zero we have the multivariate log-normal distribution. Another reason to study this distribution comes from results in Santos et al. (2013) summarized in the introduction. As it can be noticed, the distribution for the odds ratio given in (7) also belongs to the log-CFUSN family of distributions whenever the individuals under comparison have the same characteristics, that is, equal vector of covariates (), and the scale parameter for the distribution of the random effects is . In that case, where .

Figure 1 depicts the densities of for the case and some values of and . To simplify the presentation let be the matrix of ones of order and denote by the column vector of ones of order . Clearly the distribution allocates more mass to the tails when increases. Moreover, the densities shape becomes more flexible if compared with (6).

Figure 1: Log-CFUSN densities for different values of and (left) and (right).

In order to show the effect of in the asymmetry of the distribution, Figures 3 and 3 show the contour plots for the log-CFUSN densities whenever and , respectively. In both cases we assume bivariate () log-CFUSN densities. In Figure 3 the following skewness matrices of parameters are assumed , and . In Figure 3 the skewness matrices of parameters are , , and .

It is clear that the curves in Figures 3 and 3 deviate from the origin when the entries of are positive and curves are more concentrated around the origin when these entries are negative. Similar behavior is noted in the contour curves of the distribution in Arellano-Valle and Genton (2005).

Figure 2: Contour plots for the log-CFUSN densities with and (top left), (top middle), (top right), (bottom left), (bottom middle), (bottom right).
Figure 3: Contour plots for the log-CFUSN densities with and and (top left), (top middle), (top right), (bottom left), (bottom middle), (bottom right).
Figure 2: Contour plots for the log-CFUSN densities with and (top left), (top middle), (top right), (bottom left), (bottom middle), (bottom right).

It must be also noticed that the log-CFUSN family of distributions is a subclass of an extended class of log-skewed distributions with normal kernel which can be built similarly from the family defined by Arellano-Valle and Azzalini (2006). If we consider the SUN family of distribution in (4), we can define the log-SUN family of distibution as follows.

Definition 2.

(Log-SUN family of distributions) Let and be random vectors such that . We say that has a log-unified-skew-normal distribution with parameters , , and as defined in (4) denoted by , if with pdf given in (4).

It follows, as a consequence of Definition 2, that the pdf of is given by

(9)

for

Particularly, if , where is the column vector of ones of order and it follows that with pdf given in (8).

2.1 Some properties of the Log-CFUSN family of distributions

We now present several properties of the log-CFUSN family of distributions, among them are the mixed moments, the cdf and, marginal and conditional distributions. We also establish conditions for independence in the log-CFUSN family of distributions. Proposition 1 provides the cdf for this family.

Proposition 1.

If , then its cdf is given by

(10)

where

The proof of Proposition 1 follows from Proposition 2.1 in Arellano-Valle and Genton (2005) by noticing that .

The mixed moments of a random vector can be expressed in terms of the moment generating function of a distribution. This can be seen in the following proposition.

Proposition 2.

If and , , then the mixed moments of are given by

(11)

The proof of Proposition 2 follows by noticing that . As , we have . The result follows from Proposition 2.3 in Arellano-Valle and Genton (2005).

Considering the result in , we can calculate the moments of a random vector with distribution . For example, if we consider , we have that

Considering these results it can be proved that the coefficient of asymmetry and kurtosis of

are given, respectively, by

(12)

and

(13)

Consequently, if and is a matrix with all entries equal to zero, that is, if then and .

Figure 4 depicts the asymmetry coefficient and kurtosis for the distribution. Observe that corresponds to the log normal case. It is clear, at least in the case , that asymmetry and kurtosis can change significantly depending on the choice of .


Figure 4: Asymmetry (left) and Kurtosis (right) for the distribution.

Table 1 displays the asymmetry and kurtosis coefficients of the as a function of and it suggests a monotonic decreasing behavior of these quantities as increases. Although the behavior of these coefficients depends on , particularly, for and the asymmetry and kurtosis coefficients of the are both smaller than those obtained for the for all considered in the study.

Kurtosis Asymmetry Kurtosis Asymmetry.
1
2
3
4
5
Table 1: Kurtosis and asymmetry for the .

Similar to what is observed for the CFUSN family of distributions, the log-CFUSN is closed under marginalization but not under conditioning. The next result establishes that the distribution is closed under marginalization. The proof of this result will be omitted. It follows immediately from Proposition 2.6 in Arellano-Valle and Genton (2005) and Definition 1.

Proposition 3.

Let and consider the partitions and , where and has dimensions and , respectively, and . Then, for , with pdf given by

(14)

It is also possible to derive conditions for independence under the log-CFUSN family of distributions by assuming some constraints on the partitions defined in Proposition 3.

Proposition 4.

Let and consider the partitions and , where and has dimensions and , respectively, and . Let , where has dimension , , and , . Then, under each of the conditions below on the shape matrix , the random vectors and are independent

  • and, in this case, ;

  • and, in this case, e .

The proof of Proposition 4 is straightforward from Proposition 2.7 in Arellano-Valle and Genton (2005) and thus is omitted. We now obtain the conditional distributions under the family.

Proposition 5.

Let and consider the partitions and , where and has dimensions and , respectively, and . Then, the conditional pdf of given , is given by

(15)

The proof follows from results of probability calculus and by noticing that, given , we have that .

Notice that the log-CFUSN family of distribution per se is not closed under conditioning. However, if considered as a particular subclass of the log-SUN family of distribution, we notice from (15) and (9) that , where

2.2 A location-scale extension of the log-CFUSN distribution

More flexible class of distributions are obtained if we are able to include on it location and scale parameters. Usually, this is done considering a linear transformation of a variable with the standard distribution. Assuming this principle, we introduce the location-scale extension of the distribution as follows.

Assume that and define the linear transformation , where is an vector and is an positive definite matrix. As shown by Arellano-Valle and Genton (2005), the pdf of is

(16)

Let us consider the transformation . By definition, has a location-scale log-CFUSN distribution denoted by and its pdf is

(17)

It is important to note that if , that is, if we are skewing an independent -variate normal distribution, the distribution in (17) can be obtained from the log-SUN distribution given in (9) by assuming , , and that is, we have that .

Marginal and conditional distributions in the location-scale log-CFUSN class of distributions are not easily obtainable. However, under some particular structures for we can derive such results. Let , as defined in Expression 2.11 in Arellano-Valle and Genton (2005), and consider the partitions

where , and have dimensions , and , , respectively, and . Suppose also that is a diagonal matrix such that

where has dimension . Under these conditions, it follows that , that is the location-scale log-CFUSN family of distributions preserves closeness under marginalization.

It also follows that the conditional distribution of is given by

and .

2.3 Stochastic representation

Stochastic representations of skewed distributions are useful, for instance, to generate samples from those distributions more easily. They also play a very important role in inference if we are interested in apply MCMC or EM methods.

A stochastic representation of the log-CFUSN family is straightforward from the marginal stochastic representation of the CFUSN family given in Arellano-Valle and Genton (2005).

Assume that , where for any unitary vector . Let , where and are independent column random vectors of order and , respectively. Denote by the vector . Arellano-Valle and Genton (2005) prove that the marginal representation of is

(19)

If then its marginal representation follows as a consequence of (19) by noticing that , where has a multivariate log-normal distribution with a null location parameter and scale matrix equal to .

3 Some aspects of Bayesian Inference in the LCFUSN Family

Let with pdf given in (17). Define the matrices and . Therefore, it follows that the likelihood function is given by