The construction of new parametric distributions has received considerable attention in recent years. This growing interest is motivated by datasets that often present strong skewness, heavy tails, bimodality and some other characteristics that are not well fitted by the usual distributions, such as the normal, Student-, log-normal, exponential and many others. The main goal is to build more flexible parametric distributions with additional parameters allowing to control such characteristics. If compared to finite mixtures of distributions (see Lin et al., 2007b; Cabral et al., 2008, for instance) or nonparametric methods (for recent surveys on Bayesian nonparametric see Müller and Quintana, 2004; Walker, 2005; Dey et al., 1998), one advantage of this approach is that, in general, more parsimonious models are obtained and, as a consequence, the inference process tends to become simpler.
It is not feasible to mention all developments in this area in recent years. Arnold and Beaver (2002), Genton (2004) and Azzalini (2005) review several recent works in the area and are important sources of a detailed discussion of such distributions properties. Further advances in the area can be found in Genton and Loperfido (2005), Arellano-Valle and Azzalini (2006), Arellano-Valle et al. (2006), Arnold et al. (2009), Elal-Olivero et al. (2009), Arellano-Valle et al. (2010), Marchenko and Genton (2010), Goméz et al. (2011), Bolfarine et al. (2011), Rocha et al. (2013) and many others.
introduced the so called skew-normal (SN) family of distributions which probability density function (pdf) is
where and are the location and scale parameters, respectively, is the skewness parameter and and
denote, respectively, the pdf and the cumulative distribution function (cdf) of the. The family in (1) extends the normal one by introducing an extra parameter to control the asymmetry of the distribution and has the normal family as a particular subclass whenever equals zero. It also preserves some nice properties of the normal family. Another extension of the univariate distribution in (1) recently appeared in Martinez-Flores et al. (2014) which introduced the so called skew-normal alpha-power distibution. The multivariate analog of the SN distribution was introduced by Azzalini and Dalla Valle (1996).
In a more general setting, Genton and Loperfido (2005) introduced the class of generalized multivariate skew elliptical (GSE) distributions which pdf is
where is the pdf of a -dimensional elliptical distribution and is a skewing function satisfying , for all . Many of the SN distribution properties also follow to any distribution in this class. Particularly, Genton and Loperfido (2005) prove that distributions of quadratic forms in the GSE family do not depend on the skewing functionand the conditions for their independence, can be found in Huang et al. (2013). It should be also mentioned that the multivariate SN families of distributions defined by Azzalini and Dalla Valle (1996) and Azzalini and Capitanio (1999) and the family of skew-spherical (elliptical) distributions defined in Branco and Dey (2001) are subclasses of (2).
. A vectorhas a -variate canonical fundamental skew-normal (CFUSN) distribution with an skewness matrix , which will be denoted by , if its density is given by
where is such that , for all unitary vectors , and denotes euclidean norm. Along this paper, we denote by the p.d.f. associated with the multivariate distribution, and by the corresponding cumulative distribution function (c.d.f.). If (respectively and ) these functions will be denoted by and (respectively and ). For simplicity, and will be used in the univariate case.
Several classes of SN distributions were defined in the literature. An unification of these families is proposed by Arellano-Valle and Azzalini (2006) which define the unified skew-normal family of distribution, the so-called SUN family. A random vector if its pdf is
where the vectors and , is the vector of the diagonal elements of ,
is a diagonal matrix formed by the standard deviations of, , and are, respectively, , and matrices such that
is a correlation matrix. For another unification of multivariate skewed distributions see Abtahi and Towhidi (2013).
In limit cases, some of these distributions concentrate their probability mass in positive (or negative) values. The half-normal distribution, for instance, is obtaind from (1) by assuming
equal to infinite. Because of this, such family of distributions has also been considered to model data with positive support, such as income, precipitation, pollutants concentration and so on. However, such limit distributions are not flexible enough to accommodate the diversity of shapes of positive (or negative) data. In the univariate context, Gamma, exponential and log-normal distributions are commonly used to model non-negative random variables. Less conventional analysis can be done using the log-SN and log-Skew-introduced by Azzalini et al. (2003) or the log-power-normal distribution introduced by Martinez-Flores et al. (2012).
In the multivariate context, however, distributions with positive support are usually intractable, with the exception of the multivariate log-normal distribution. With the above problem in mind, Marchenko and Genton (2010) built the multivariate log-skew elliptical family of distributions as follows. Denote by the family of -dimensional elliptical distributions (with existing pdf) with generating function , defining a -dimensional spherical density, a location column vector , and a x positive definite dispersion matrix . If , then its pdf is , where , (Fang et al., 1990). Consider the class of skew elliptical distributions with pdf given by
where is a shape parameter, is a x scale matrix, is the pdf of a -dimensional random vector of and is the cdf of the with generating function . The distribution in (5) is denoted by . Consider the transformation , where . Then, has log-skew elliptical distribution denoted by with pdf
It is immediate that the multivariate skew-normal (Azzalini and Dalla Valle, 1996) and skew-t (Azzalini and Capitanio, 2003) distributions are special cases of (5). Consequently, the log-skewed class of distributions in (6) introduced by Marchenko and Genton (2010) also defines particular classes of multivariate log-SN and log-skew- distributions and has, as a special case, the multivariate log-normal family of distributions.
Our main motivation to introduce new classes of multivariate log-skewed distribution are some results that recently appeared in a paper by Santos et al. (2013)
. That paper focused on the parameter interpretation in the mixed logistic regression models which is done through the so called odds ratio as in the usual logistic regression model. However, by considering the random effects, the odds ratio to compare two individuals in two different clusters becomes a random variable () that depends on the random effects related to the two clusters under comparison (Larsen et al., 2000). Because of this, Larsen et al. (2000) propose to interpret the odds ratio in terms of the median of its distribution in order to quantify appropriately the heterogeneity among the different clusters. If the random effects are independent and identically distributed (iid) with then Santos et al. (2013) prove that the odds ratio has distribution with pdf given by
where , and . Similar distributions were also obtained under independent skew-normally distributed random effects. The univariate log-skewed distribution in (7) does not belong to the class of distributions defined by Marchenko and Genton (2010), nor to that introduced by Azzalini et al. (2003). Moreover, only its median was obtained by Santos et al. (2013) but no other property of it was studied.
In this paper, we introduce the multivariate log-CFUSN and log-SUN family of distributions. We explore their relationship and study some properties of the log-CFUSN family of distributions. Such classes of distributions have as subclasses the multivariate log-skew-normal family introduced by Marchenko and Genton (2010), the log-SN family by Azzalini et al. (2003) and the family of distributions given in (7). We also discuss some issues related to Bayesian inference in this family. To illustrate its use we analyze the USA monthly precipitation data recorded from 1895 to 2007, that is available at the National Climatic Data Center (NCDC).
This paper is organized as follows. In Section 2 we define the log-CFUSN and the log-SUN families of distributions and establish some of the probabilistic properties of the log-CFUSN family of distributions. Bayesian inference in the log-CFUSN family is discussed in Section 3. In Section 4 we present some data analysis using the proposed log-CFUSN family of distributions. Finally, Section 5 finishes the paper with a discussion and our main conclusions.
2 Log-SUN and Log-CFUSN families of distributions
Under the normal theory, the log-normal family of distributions is obtained assuming the logarithimic transformation. If a random variable is log-normally distributed it follows that the log transformation of it, that is, , has a normal distribution. Following this idea, in this section, we formally define the log-canonical-fundamental-skew-normal (log-CFUSN) and the log-unified-skew-normal (log-SUN) families of distributions and explore some properties of the log-CFUSN such as conditional and marginal distributions, mixed moments and stochastic representations.
Let be an random vector and consider the transformations and .
(Log-CFUSN family of distributions) Let and be random vectors such that . We say that has a log-canonical-fundamental-skew-normal distribution with skewness matrix denoted by , if with pdf given in (3).
Thus, from definition 1, we have that and using some results of probability calculus, we can prove that the pdf of the log-CFUSN family of distributions with skewness matrix is
where is an matrix such that , for all unity vectors a .
This distribution generalizes the multivariate log-SN distribution defined by Marchenko and Genton (2010) by assuming a -variate skewing function. If in (8) we take and assume we obtain the family defined by Marchenko and Genton (2010) which general expression is given in . If is a matrix with all entries equal to zero we have the multivariate log-normal distribution. Another reason to study this distribution comes from results in Santos et al. (2013) summarized in the introduction. As it can be noticed, the distribution for the odds ratio given in (7) also belongs to the log-CFUSN family of distributions whenever the individuals under comparison have the same characteristics, that is, equal vector of covariates (), and the scale parameter for the distribution of the random effects is . In that case, where .
Figure 1 depicts the densities of for the case and some values of and . To simplify the presentation let be the matrix of ones of order and denote by the column vector of ones of order . Clearly the distribution allocates more mass to the tails when increases. Moreover, the densities shape becomes more flexible if compared with (6).
In order to show the effect of in the asymmetry of the distribution, Figures 3 and 3 show the contour plots for the log-CFUSN densities whenever and , respectively. In both cases we assume bivariate () log-CFUSN densities. In Figure 3 the following skewness matrices of parameters are assumed , and . In Figure 3 the skewness matrices of parameters are , , and .
It is clear that the curves in Figures 3 and 3 deviate from the origin when the entries of are positive and curves are more concentrated around the origin when these entries are negative. Similar behavior is noted in the contour curves of the distribution in Arellano-Valle and Genton (2005).
It must be also noticed that the log-CFUSN family of distributions is a subclass of an extended class of log-skewed distributions with normal kernel which can be built similarly from the family defined by Arellano-Valle and Azzalini (2006). If we consider the SUN family of distribution in (4), we can define the log-SUN family of distibution as follows.
It follows, as a consequence of Definition 2, that the pdf of is given by
Particularly, if , where is the column vector of ones of order and it follows that with pdf given in (8).
2.1 Some properties of the Log-CFUSN family of distributions
We now present several properties of the log-CFUSN family of distributions, among them are the mixed moments, the cdf and, marginal and conditional distributions. We also establish conditions for independence in the log-CFUSN family of distributions. Proposition 1 provides the cdf for this family.
If , then its cdf is given by
The mixed moments of a random vector can be expressed in terms of the moment generating function of a distribution. This can be seen in the following proposition.
If and , , then the mixed moments of are given by
Considering the result in , we can calculate the moments of a random vector with distribution . For example, if we consider , we have that
Considering these results it can be proved that the coefficient of asymmetry and kurtosis ofare given, respectively, by
Consequently, if and is a matrix with all entries equal to zero, that is, if then and .
Figure 4 depicts the asymmetry coefficient and kurtosis for the distribution. Observe that corresponds to the log normal case. It is clear, at least in the case , that asymmetry and kurtosis can change significantly depending on the choice of .
Table 1 displays the asymmetry and kurtosis coefficients of the as a function of and it suggests a monotonic decreasing behavior of these quantities as increases. Although the behavior of these coefficients depends on , particularly, for and the asymmetry and kurtosis coefficients of the are both smaller than those obtained for the for all considered in the study.
Similar to what is observed for the CFUSN family of distributions, the log-CFUSN is closed under marginalization but not under conditioning. The next result establishes that the distribution is closed under marginalization. The proof of this result will be omitted. It follows immediately from Proposition 2.6 in Arellano-Valle and Genton (2005) and Definition 1.
Let and consider the partitions and , where and has dimensions and , respectively, and . Then, for , with pdf given by
It is also possible to derive conditions for independence under the log-CFUSN family of distributions by assuming some constraints on the partitions defined in Proposition 3.
Let and consider the partitions and , where and has dimensions and , respectively, and . Let , where has dimension , , and , . Then, under each of the conditions below on the shape matrix , the random vectors and are independent
and, in this case, ;
and, in this case, e .
Let and consider the partitions and , where and has dimensions and , respectively, and . Then, the conditional pdf of given , is given by
The proof follows from results of probability calculus and by noticing that, given , we have that .
2.2 A location-scale extension of the log-CFUSN distribution
More flexible class of distributions are obtained if we are able to include on it location and scale parameters. Usually, this is done considering a linear transformation of a variable with the standard distribution. Assuming this principle, we introduce the location-scale extension of the distribution as follows.
Assume that and define the linear transformation , where is an vector and is an positive definite matrix. As shown by Arellano-Valle and Genton (2005), the pdf of is
Let us consider the transformation . By definition, has a location-scale log-CFUSN distribution denoted by and its pdf is
It is important to note that if , that is, if we are skewing an independent -variate normal distribution, the distribution in (17) can be obtained from the log-SUN distribution given in (9) by assuming , , and that is, we have that .
Marginal and conditional distributions in the location-scale log-CFUSN class of distributions are not easily obtainable. However, under some particular structures for we can derive such results. Let , as defined in Expression 2.11 in Arellano-Valle and Genton (2005), and consider the partitions
where , and have dimensions , and , , respectively, and . Suppose also that is a diagonal matrix such that
where has dimension . Under these conditions, it follows that , that is the location-scale log-CFUSN family of distributions preserves closeness under marginalization.
It also follows that the conditional distribution of is given by
2.3 Stochastic representation
Stochastic representations of skewed distributions are useful, for instance, to generate samples from those distributions more easily. They also play a very important role in inference if we are interested in apply MCMC or EM methods.
A stochastic representation of the log-CFUSN family is straightforward from the marginal stochastic representation of the CFUSN family given in Arellano-Valle and Genton (2005).
Assume that , where for any unitary vector . Let , where and are independent column random vectors of order and , respectively. Denote by the vector . Arellano-Valle and Genton (2005) prove that the marginal representation of is
If then its marginal representation follows as a consequence of (19) by noticing that , where has a multivariate log-normal distribution with a null location parameter and scale matrix equal to .
3 Some aspects of Bayesian Inference in the LCFUSN Family
Let with pdf given in (17). Define the matrices and . Therefore, it follows that the likelihood function is given by