1 Introduction
Compositional data are positive multivariate data which sum to the same constant, usually . In this case, their sample space is the standard simplex
(1) 
where denotes the number of variables (better known as components).
Compositional data are met in many different scientific fields. In sedimentology, for example, samples were taken from an Arctic lake and their composition of water, clay and sand were the quantities of interest. Data from oceanography studies involving Foraminiferal (a marine plankton species) compositions at different sea depths from oceanography were analyzed in Aitchison (2003, pg 399). Schnute & Haigh (2007) also analyzed marine compositional data through catch curve models for a quillback rockfish (Sebastes maliger
) population. In hydrochemistry, Otero et al. (2005) used regression analysis to draw conclusions about anthropogenic and geological pollution sources of rivers in Spain. Stewart & Field (2011) modeled compositional diet estimates with an abundance of zero values obtained through quantitative fatty acid signature analysis. In another biological setting, Ghosh & Chakrabarti (2009) were interested was in the classification of immunohistochemical data. Other applications areas of compositional data analysis include archaeometry (Baxter et al., 2005), where the composition of ancient glasses, for instance, is of interest, and in economics (Fry, Fry, & McLaren, 2000), where the focus is on the percentage of the household expenditure allocated to different products. Compositional data are also met in political science (Katz, & King, 1999) for modeling electoral data and in forensic science where the compositions of forensic glasses are compared and classified (Neocleous, Aitken, & Zadora, 2011). In demography, compositional data are met in multipledecrement life tables and the mortality rates amongst age groups are modeled (Oeppen, 2008). In a study of the brain, Prados et al. (2010) evaluated the diffusion anisotropy from diffusion tensor imaging using new measures derived from compositional data distances. Some recent areas of application include bioinformatics and specifically microbiome data analysis (Xia et al., 2013; Chen & Li, 2016; Shi, Zhang & Li, 2016). These examples illustrate the breadth of compositional data analysis applications and consequently the need for parametric models defined on the simplex.
The Dirichlet distribution is a natural distribution for such data due to its support being the simplex space. However, it has long been recognized that this distribution is not statistically rich and flexible enough to capture many types of variabilities (especially curvature) of compositional data and, for this reason, a variety of transformations have been proposed that map the data outside of the simplex. In Aitchison (1982) the logratio transformation approach was developed, and later the so called isometric logratio transformation methodology which was first proposed in Aitchison (2003, pg 90) and examined in detail by Egozcue et al. (2003). More recently, Tsagris, Preston & Wood (2011) suggested the transformation which includes the isometric transformation as a special case. The transformation is a BoxCox type transformation and has been successfully applied in regression analysis (Tsagris, 2015) and classification settings (Tsagris, Preston & Wood, 2016).
Regardless of the transformation chosen, the usual approach for modeling compositional data is to assume that the transformed data is multivariate normally distributed. While the transformation offers flexibility, a disadvantage of this transformation is that it maps the compositional data from the simplex () to a subset of , and not itself on which the multivariate normal density is defined. An improvement to this method can be obtained by using a folded model procedure, similar to the approach used in Scealy & Welsh (2014) and to the folded normal distribution in employed in Leone, Nelson & Nottingham (1961) and Johnson (1962). The folded normal distribution in
, for example, corresponds to taking the absolute value of a normal random variable and essentially “folding” the negative values to the positive side of the distribution. The model we propose here works in a similar fashion where values outside the simplex are mapped inside of it via a folding type transformation. An advantage of this approach over the aforementioned logratio methodology is that it allows one to fit any suitable multivariate distribution on
through the parameter .The paper is structured as follows. In Section 2 we describe the folding approach and in Section 3 we introduce the folded multivariate normal model for compositional data, along with the EM algorithm for maximum likelihood estimation (MLE), an algorithm for generating data from the proposed folded distribution, inference for
and an approach for principal component analysis. Simulation studies are conducted in Section
4 to assess the estimation accuracy of the parameters and two data sets are analyzed to further assess the performance of our model. Finally, concluding remarks may be found in Section 5.2 The folding technique
2.1 The transformation
For a composition , the centered logratio transformation is defined in [] as
(2) 
The sample space of Equation (2) is the set
(3) 
which is a subset of . Note that the zero sum constraint in Equation (2) is an obvious drawback of this transformation as it can lead to singularity issues. In order to remove the redundant dimension imposed by this constraint, one can apply the isometric logratio transformation
(4) 
where is the Helmert matrix (Lancaster, 1965) (an orthonormal matrix) after deletion of the first row. The left multiplication by the Helmert submatrix maps the data onto
thus, in effect, removing the zero sum constraint. The Helmert submatrix is a standard orthogonal matrix in shape analysis used to overcome singularity problems (Dryden & Mardia, 1998;, Le & Small, 1999).
Tsagris, Preston & Wood (2011) developed the transformation as a more general transformation than that in Equation (4). Let
(5) 
denote the power transformation for compositional data as defined by Aitchison (2003). In a manner analogous to Equations (24), first define
(6) 
The sample space of Equation (6) is then the set
(7) 
Note that the inverse of Equation (6) is as follows
(8) 
for . Finally, the transformation is defined as
(9) 
The transformation in Equation (9) is a onetoone transformation which maps data inside the simplex onto a subset of and vice versa for . The corresponding sample space of Equation (9) is
(10) 
For , the inverse transformation from to is given by where is given in Equation (8).
For convenience purposes we allow to lie within . From Equations (5) and (6), when , the simplex is linearly expanded as the values of the components are simply multiplied by a scalar and then centered. When , the inverse of the values of the components are multiplied by a scalar and then centered.
If we assume that the transformed data (for any value of ) follow a multivariate normal distribution, then a way to choose the value of is via maximum likelihood estimation. However, given that the multivariate normal distribution is defined on and that
, this approach might neglect an important amount of volume (probability) of the multivariate normal. The same problem arose in Scealy & Welsh (2011b), who then developed the folded Kent distribution (Scealy & Welsh, 2014) and we propose a similar solution here.
2.2 The folding transformation
Let in Equation (9) for and some value of , then the inverse transformation provides a transformation from to . Now suppose we have a point (that is, outside of ) and we want to map it inside the simplex . It can be shown (Tsagris, 2013) that the following transformation maps to
(11) 
where is defined in Equation (8) and .
Note that the inverse of Equation (11) is the dimensional vector
(12) 
where and is defined in Equation (6).
In summary, we propose the following folding transformation from to
(13) 
where and .
3 The folded multivariate normal distribution
3.1 Definition
The distribution of in Equation (13) can be derived as a mixture distribution if we assume that is multivariate normally distributed with parameters and , that is if . Motivated by the folded normal distribution in Leone (1961), we define the distribution of on and refer to it as the folded multivariate normal distribution. The term folded is a combination of the transformation and the folding type approach that is applied.
The distribution of can be written as
(14) 
where and refer to the densities of and respectively when , and is the probability that .
To specify the density , we require the relevant Jacobians of the transformations and . These are given in the following two Lemmas with corresponding proofs in the Appendix.
Lemma 3.1
The Jacobian of is
(15) 
Lemma 3.2
The Jacobian of is
From the two lemmas, it follows that
where , , and is given in Equation (9). We will refer to the density in Equation 3.1 as the folded multivariate normal distribution.
Although we have indicated that , it is important to note that boundaries on the simplex (that is, zero values) are not allowed due to the log of being undefined in such cases. This potential limitation also occurs with conventional transformations for compositional data analysis as well, including the isometric transformation in (Equation 4). Recent approaches to handle zero values include mapping the data onto the hypersphere (Scealy and Welsh, 2011b, Scealy and Welsh), assuming latent variables (Butler & Glasbey, 2008) or conditioning on the zero values (Stewart & Field 2011; Tsagris & Stewart, 2018).
A few special cases are worthy of mention. First, when , the second term in Equation (3.1) vanishes and we end up with the normal distribution []. When , . Finally, when , Equation (3.1) reduces to the multivariate logistic normal on ([])
where is defined in Equation (4). The density function in Equation (3.1) is that of the logistic normal on . This leads us to the following Definition which generalizes Aitchison’s (2003) Definition 6.2.
A Dpart composition is said to follow the folded multivariate normal distribution if follows a multivariate normal distribution in where
(18) 
and is defined in Equation (9).
The generic power transformation Equation (9) includes the isometric logratio transformation in Equation (4) and thus the same concepts can be used to build the model. This means that once the data are transformed via Equation 18, a multivariate normality test can be applied. However, numerical optimization is required for obtaining the maximum likelihood estimates of the mean vector and covariance matrix, as described in Section 3.3.
The folded multivariate normal distribution resolves the problem associated with the assumption of multivariate normality of the transformed data (Tsagris, Preston & Wood, 2011); the ignored probability left outside the simplex due to the sample space of the transformation being a subset of . With this distribution, any ignored probability is folded back onto the simplex, and hence the density has the form of Equation (3.1).
3.2 Contour plots of the folded bivariate normal distribution
We now consider a visual illustration of the bivariate normal distribution with and without folding. In particular, we examine contours of the normal distribution in and compare these to the contours of the folded model in Equation 3.1 plotted on the simplex, with . We consider two settings in which in both cases, but use two different covariances matrices as follows
(19) 
The mean vector was selected by applying the transformation in Equation (9) with to a subcomposition formed by the first three components of the Hongite data (artificial dataset used by Aitchison, 2003). The elements of the first covariance matrix () were chosen so that the correlation was positive, whereas the second covariance matrix is such that .
For each combination of parameters we calculated the density of the normal and the folded normal for a grid of twodimensional vectors and then plotted their contours. While for the unconstrained normal case, the density was simply calculated for all grid points, for the folded model, the transformation in Equation 13 was first applied (with ) and then the density in Equation 3.1 was calculated. The contour plots are presented in Figure 1.
(a)  (b) 
(c)  (d) 
Figures (a) and (c) depict the contours of the multivariate normal distribution (defined in ) while Figures (b) and (d) show the contours of the folded normal (defined on the simplex). The first row corresponds to in Equation (19) while the second row is derived from . The triangles in the second column (Figures (b) and (d)) display the simplex while the corresponding triangles in Figures (a) and (c) were obtained through the transformation in Equation 9.
What is perhaps most evident from the contour plots is that points falling outside the triangle in Figures (a) and (c) result in modes inside the simplex in Figures (b) and (d) respectively. When there is a high probability of being left outside of two or more sides of the triangle (or faces of the pyramid or hyperpyramid in higher dimensions), as in Figures 1(c) and 1(d), the contours of the folding model will have a somewhat peculiar shape due to a multimodal distribution arising on the simplex. The multimodality depends upon the allocation of the probability left outside the simplex along the components. If only one side of the simplex has probability left outside (as in Figure 1(a)) then the resulting distribution will be unimodal (see Figure 1(b)).
An estimate of the probability left outside each side of the triangle (or the simplex) may be obtained through simulation in a straightforward manner. To accomplish this for our example, we generated data from the multivariate normal distribution with the parameters previously specified and applied the inverse of the transformation (that is, ) in Equation 13. For the points left outside of the simplex, we simply calculated how many are outside of each edge of the triangle and divided by the total number of the simulated data points.
If we partition the missed probability into three parts, where each part refers to one of the three components, then we obtain the values for the case when (see Figures 1(a) and 1(b)). In this case, most of the probability is left outside the third component and the total probability left outside of the simplex is therefore . The total probability left outside of the simplex when (see Figures 1(c) and 1(d)) is and the allocation to the three components is different than in the previous example, namely (). Since, in this case, all of the estimates are relatively high, multimodality appears in Figure 1(d).
3.3 Maximum likelihood estimation
The estimation of the parameters of the folded model on (that is, ) is not too complicated mainly because there are not too many parameters ( for the mean vector, for the covariance matrix and two more including the value of and the probability ) involved in the maximization of the loglikelihood. We can use the “simplex” algorithm of Nelder & Mead (1965) to maximize the logarithm of Equation (3.1), available via the command optim in R (R Core Team, 2015). This algorithm is generally robust and is derivative free (Venables & Ripley, 2002). However when moving to higher dimensions (roughly = 5 or larger), the maximization is not straightforward.
For this reason, we use the EM algorithm (McLachlan & Krishnan, 2007) to maximize the loglikelihood corresponding to Equation 3.1. Let denote the set of compositional vectors in . Following Jung, Foskey & Marron (2011) who applied the EM algorithm in the context of a univariate folded normal we propose the algorithm below.
EM algorithm

For a fixed value of , apply the transformation without the Helmert submatrix multiplication (that is, Equation 6) to the compositional data to obtain the matrix .

Calculate for each vector , for .

Left multiply each by the Helmert submatrix to obtain . Then multiply each by to obtain (see Equation 18). The and are the transformed compositional data onto and , for respectively.

Choose initial values for the estimates of the parameters, for example
and
where is the estimated conditional expectation of the indicator function that indicates whether the th observation belongs to or .

Update all the parameters each time, for

Repeat Step 5 until the change between two successive loglikelihood
(20) values is less than a tolerance value.
The above described procedure should be repeated for a grid of values of , ranging from to , choosing the value of which maximizes the loglikelihood. A more efficient search for the best alpha is via Brent’s algorithm (Brent, 2013). When , the MLE estimates of the transformed data are obtained directly; no EM algorithm is necessary as all the probability is retained with the simplex. The fact that (9) tends to (4) as ensures the continuity of the loglikelihood at .
3.4 Generating data from the folded multivariate normal distribution
The algorithm below describes how to simulate a random vector from the folded model in Equation (3.1) when . The case when , is considered subsequently.

Choose and , where .

Generate a by 1 vector from a .

As per Equation 13, determine whether . To do this, compute . If for all components of and , then and let . Otherwise, and let where .
When , a the following simplified algorithm is used:

Choose and .

Generate a by 1 vector from a .

Compute .

(21)
3.5 Inference for
In the previous two subsections, simplifications arose if and it may, consequently, be worthwhile to test whether the simpler multivariate normal in Equation (3.1) (corresponding to ) is appropriate for the data at hand.
Consider the hypothesis test: versus . While one option is to use a loglikelihood ratio test, depending on the alternative hypothesis (a. & , b. & , c. & or d. ,
), the degrees of freedom will vary. We have not encountered case d. so far in our data analyses but, in this case, would recommend using a Dirichlet model. In fact, if we generate data from a Dirichlet distribution, case d. is expected to arise from the MLE of Equation (
3.1). An alternative to the log likelihood ratio test is to use a parametric bootstrap such as the hypothesis testing procedure described below.
For a given dataset, estimate the value of
obtained via the EM algorithm. This is the observed test statistic denoted by
. 
Apply the inverse of the isometric transformation with in Equation (4) to the data in Step 2. to form a new sample of compositions acquired with
. That is, the data has been transformed under the null hypothesis.

Resample times from this new composition and each time estimate the value of for .
The value is then given by (Davison & Hinkley, 1997)
(22) 
where is the indicator function.
One might argue that the value of
itself is not a pivotal statistic, in fact it is not even standardized, so a second bootstrap should be performed to obtain the standard error of the estimate for each bootstrap sample. In order to avoid this extra computational burden, the parametric bootstrap hypothesis testing could alternatively be carried out using the loglikelihood ratio test statistic in Steps 1 and 4 above.
Inference for
could also be achieved via the construction of bootstrap confidence intervals. For this approach, simply resample the observations (compositional vectors) from the compositional dataset and find the value of
for which the loglikelihood derived from Equation (3.1) is maximized for each bootstrap sample. By repeating this procedure many times, we can empirically estimate the distribution of , including the standard error of . A variety of confidence intervals may be formed based on this distribution (see Davison & Hinkley, 1997 and R package boot). The percentile method, for example, simply uses thelower and upper quantiles of the bootstrap distribution as confidence limits.
A less computationally intensive approach to obtain confidence intervals is based upon the second derivative of the profile loglikelihood of , that is, the observed Fisher’s information measure. Assuming asymptotic normality of the estimator, the inverse of the observed information serves as an estimate of the standard error of the maximum likelihood estimator (Cox & Hinkley, 1979).
3.6 Principal component analysis
Principal component analysis for compositional data has been described by Aitchison (1983). The centred logratio transformation (2
) is applied to the compositional data and standard eigen analysis is applied to the covariance matrix (which has at least one zero eigenvalue).
If , we suggest an analogous approach in which the estimated covariance matrix is mapped to space (Equation 7), using the Helmert submatrix
If , the covariance is mapped to (Equation (3)).
Step 3 of the algorithm presented in 3.4 is used to backtransform the principal components onto the simplex.
4 Simulation study and data analysis examples
4.1 Simulation study
Recovery of the mean, covariance and probability left outside the simplex
For this part of our simulation study, two values of were chosen, namely and , and 4dimensional data () from a multivariate normal distribution with two different mean vectors and a variety of different covariance parameters were generated. Specifically, we used mean vectors
and covariance matrices
(23) 
where . Note that the value of changes the probability that a point is left outside of the simplex. The “true” values of for each value of were computed through MonteCarlo simulations and (denoted as ”Probability”) for each value is presented in Table 1.
For each combination of , and , seven sample sizes, namely , were considered. Results are based on simulated data sets (for each ) and, for each simulated sample, estimates of , and were calculated, and their distance from the true parameters were estimated. All computations took place in R 3.2.3 R (R Core Team, 2015) using a desktop computer with Intel Core i5 at 3.5 GHz processor and 32GB RAM.
For the probability left outside of the simplex (that is, 1), the absolute difference between the estimated probability and the true probability is computed. For the mean vector, the Euclidean distance was calculated to measure the discrepancy between the estimated vector and true vector whereas for the covariance matrix, the following metric [] was calculated
where denotes the th eigenvalue of the matrix .
Note that while we could have used the KullbackLeibler divergence of the fitted multivariate normal from the true multivariate normal to evaluate the overall performance of our estimation method, we would then have had no individual information regarding the accuracy of our procedure in terms of estimating the probability left outside the simplex, the mean vector and the covariance matrix. The results of the simulation studies are presented in Table
1 and Figure 2.0.5  1  2  3  5  7  10  

Estimated  0.0281  0.0925  0.1997  0.2800  0.3900  0.4630  0.5377  
Probability  0.0223  0.1048  0.1849  0.3060  0.4217  0.4750  0.5356 
Simulation study when  
Simulation study when  
From Figure (2) we observe that when the probability left outside the simplex grows larger ( is larger), a larger sample size is required in order to get better estimates, for both the probability and the mean vector. The covariance matrix seems to be unaffected by the probability left outside the simplex.
Estimation of the computational time required by the maximum likelihood estimation
Using only , we generated data data as before with increasing sample sizes and, for each sample size, recorded the time (in seconds) required to estimate the true value of . As expected, the computational cost is mostly affected by . For large sample sizes the computational burden is similar regardless of the probability of being outside of the simplex.
Sample size  0.5  1  2  3  5  7  10 

50  0.052  0.058  0.065  0.075  0.095  0.124  0.148 
100  0.073  0.079  0.092  0.106  0.135  0.174  0.212 
200  0.090  0.089  0.096  0.105  0.130  0.162  0.195 
300  0.095  0.095  0.096  0.108  0.128  0.160  0.194 
500  0.084  0.083  0.091  0.100  0.123  0.156  0.188 
750  0.078  0.074  0.081  0.089  0.109  0.135  0.163 
1000  0.082  0.078  0.086  0.095  0.117  0.140  0.168 
2000  0.267  0.390  0.357  0.352  0.339  0.276  0.302 
5000  0.638  0.933  0.838  0.841  0.816  0.675  0.722 
10000  1.435  2.113  1.905  1.915  1.858  1.544  1.723 
Recovery of the true value of
In the previous simulations, recall that the value of was fixed. In order to have a better picture of our estimation procedure, we also assessed the accuracy of the true value of for large sample sizes for insight into the asymptotic case. The mean vector was set to
and the covariance matrices were the same as in 23.
For a range of values of ranging from up to with a step of we estimated these values for the different values of using 4 sample sizes . For each combination of , and we used repetitions.
Figure 3 shows the average bias of the estimates in boxplots for each sample size. Each box corresponds to a value of and is the average bias aggregated for all values of . For example, Figure 3(a) refers to a sample size equal to and the first box contains information about the average biases of the values of
. The range in the variances increases with the value of
as expected, since higher values of indicate higher probability of being left outside of the simplex. Table 3 presents for many combinations of values of and calculated using Monte Carlo simulation with repetitions.(a) n = 1000  (b) n = 5000 
(c) n = 10000  (d) n = 20000 
0.5  1  2  3  5  7  10  

0.0  0.000  0.000  0.000  0.000  0.000  0.000  0.000 
0.1  0.000  0.000  0.000  0.000  0.000  0.000  0.002 
0.2  0.000  0.000  0.001  0.007  0.034  0.071  0.128 
0.3  0.000  0.004  0.040  0.091  0.187  0.316  0.348 
0.4  0.006  0.047  0.156  0.245  0.367  0.445  0.522 
0.5  0.043  0.149  0.306  0.402  0.516  0.583  0.648 
0.6  0.132  0.284  0.448  0.536  0.632  0.687  0.741 
0.7  0.258  0.423  0.571  0.644  0.722  0.768  0.812 
0.8  0.398  0.551  0.673  0.731  0.794  0.830  0.866 
0.9  0.535  0.661  0.757  0.802  0.851  0.880  0.907 
1.0  0.66  0.756  0.827  0.861  0.898  0.918  0.937 
4.2 Example 1: Sharp’s artificial data
We chose to analyze Sharp’s artificial data (Sharp, 2006) because they are curved data and according to Aitchison (2003) the logistic normal (3.1) should produce a very good fit for curved data. Clearly a Dirichlet distribution would fail to capture the variability of such data and we would not expect a value of to do better. This dataset consists of components and Sharp constructed them from Aitchison’s Hongite data (Aitchison, 2003). []
showed that the normalized geometric mean of the components (assuming a logistic normal distribution) fails to lie within the corpus of the data, whereas the spatial graph median does.
Figure 4(a) shows the profile loglikelihood of and the maximum of the loglikelihood which occurs at . The loglikelihood values at and are equal to and respectively. The loglikelihood ratio test of at degrees of freedom clearly rejects the logistic normal on the simplex (that is the optimal transformation) and this conclusion is in line with the confidence interval limits. Figure 4(b) shows the ternary diagram of the data and we can clearly see that both the simple and the normalized geometric mean fail to lie within the main bulk of the data. However, the Fréchet mean (or mean) (calculated at ) achieves this goal. The three means in are
and the estimated parameters assuming a multivariate normal in with and are
Note that in order to transform the mean (that is, in our example) inside the simplex, we apply Equation 13. The resulting compositional mean vector (mean) was termed Fréchet mean by (Tsagris, Preston & Wood, 2011).
In this example the probability left outside the simplex was calculated via Monte Carlo simulations (with 20,000,000 iterations) and was equal to . The value of the mixing proportion in (3.1) was and . However, there were 2 out of 25 estimated values of the less than
, so without these (possible outliers) a proportion equal to
would have been obtained and this is closer to the MonteCarlo estimated probability left outside the simplex.The contour plots of the folded model with and appear in Figures 5(a) and 5(b) respectively. We generated observations from each model and these are plotted in Figures 5(c) and 5(d). When the simulated data look more like the observed data, in contrast to the simulated data with .
(a)  (b) 
(a)  (b) 
(c)  (d) 
4.3 Example 2: Arctic lake data
Samples from an Arctic lake were collected and the composition of three minerals, sand silt and clay, was measured at different depths (Aitchison, 2003, pg 359). The estimated optimal value of for this dataset was equal to and Figure 6(b) shows the data along with the three means and the scores of the first principal component, evaluated at (geometric mean normalized to sum to 1) and (arithmetic mean).
A closer look at Figure 6(b) reveals that the mean evaluated at lies within the bulk of the data in contrast to the mean evaluated at as well as the arithmetic mean (). The improved fit with (over ) can also be seen from the centers of the contour plots in Figures 7(a) and 7(b). There are two observations which seem to be highly influential.
We removed the two seemingly influential observations and reimplemented the analysis. The optimal value of is now and the contour plots of the folded model at and are presented in Figure2 7(c) and 7(d). We again observe that a value of other than appears to provide a better fit to the data, leading to the same conclusion as before. That is, even though the data seem to be curved in the simplex, a logratio transformation is not the most suitable transformation.
(a)  (b) 
(a)  (b) 
(c)  (d) 
5 Conclusions
In this paper we developed a novel parametric model, with nice properties, for compositional data analysis. Tsagris, Preston & Wood (2011) introduced the transformation and corresponding multivariate normal distribution. A drawback of that model is that it does not take into account the probability left outside the simplex. This is similar to the BoxCox transformation, where the support of the transformed data is not the whole of as we have already mentioned. Another possible solution would be to use truncation []. However, in the examples presented the folded model appeared to fit the data adequately and, interestingly, an improved fit was obtained when . For more examples where the logratio transformation fails to capture the variability of the data see Tsagris, Preston & Wood (2011), baxter (2006) and Sharp (2006).
The use of another multivariate model, such as the multivariate skew normal distribution (Azzalini & valle, 1996) has been suggested. The difficulty, however, with this distribution is that more parameters need to be estimated, thus making the estimation procedure more difficult because the loglikelihood has many local maxima. This of course does not exclude the possibility of using this model. Another, perhaps simpler, alternative model is the multivariate
distribution. Bayesian analysis and regression modeling are two suggested research directions.Similar to the BoxCox transformation, the logarithm of the Jacobian of the transformation contains the term . Thus, zero values are not allowed. Note that this issue also arises with the logistic normal. However, it is possible to generalize most of the analyzes suggested for the logistic normal distribution using our proposed folded model, including extensions that allow zeros.
As is standard practice in logratio transformation analysis, if one is willing to exclude from the sample space the boundary of the simplex, which corresponds to observations that have one or more components equal to zero, then the transformation (9) and its inverse are well defined for all , and the folded model provides a new approach for the analysis of compositional data with the potential to provide an improved fit over traditional models.
Proof of Lemma 3.1
Let us begin by deriving the Jacobian determinant of (5) at first. The map (5) is degenerate due to the constraints and . In order to make (9) nondegenerate we consider the version of (5) as follows
(A.1) 
The (A.1) is presented to highlight that in fact we have and not variables.
Let us start by proving the Jacobian of (5) or (A.1). We denote , where . The diagonal and the nondiagonal elements of the Jacobian matrix are as follows.
The Jacobian takes the following form []:
where is a diagonal matrix with elements and and are defined as
Then
Then the multiplication is
So we end up with
Finally the Jacobian of (5) takes the following form
Proof of Lemma 3.2
The proof presented below is about the case of for convenience purposes. The way to map a point from inside the simplex to a point outside the simplex, is given in Equation 12 and the component wise transformation can be expressed as
(A.2) 
where is defines in (6) and since , we have exclude the superscript .
There are two cases to consider when calculating the Jacobian determinant of the transformation.

The first case is when and the transformation is
There are two subcases to be specified.
