Linear mixed models (LMM) are commonly used to model data that present a natural hierarchical structure because they flexibly model the within-subject correlation (Pinheiro and Bates, 2000). This kind of structure appears when a variable of interest is repeatedly measured for several subjects (or cluster units, in general), which is frequently the case in clinical trials.
For mathematical convenience, it is usually assumed that both random effect and error follow Gaussian distributions. Nevertheless, these restrictive assumptions may result in a lack of robustness against departures from the normal distribution and invalid statistical inferences, especially when the data show heavy tails and skewness(see, e.g., Drikvandi et al., 2017; Drikvandi, 2019). Several approaches have been considered in the literature to replace the normal assumptions of LMM with more flexible distributions. For example, Pinheiro et al. (2001) proposed a multivariate t linear mixed model (T-LMM), and Rosa et al. (2003) considered the thick-tailed class of normal/independent (NI) distributions from a Bayesian framework.
Accounting for skewness, Arellano-Valle et al. (2005) proposed a skew-normal linear mixed model (SN-LMM) based on the classic skew-normal (SN) distribution introduced by Azzalini and Dalla Valle (1996), and Ho and Lin (2010) proposed a skew-t linear mixed model (ST-LMM) based on the classic skew-t (ST) distribution introduced by Azzalini and Capitanio (2003). More generally, Lachos et al. (2010a)
proposed robust parametric modeling of LMM based on skew-normal/independent (SNI) distributions, where random effects follow an SNI distribution and within-subject errors follow a normal/independent (NI) distribution, and consequently observed responses follow an SNI distribution, and they define what they call the skew-normal/independent linear mixed model, andSchumacher et al. (2021) extended the SNI-LMM by considering within-subject serial dependence and developing additional tools for model evaluation.
In general, SN and ST distributions extend the normal and Student-t distributions by introducing additional parameters regulating skewness. Some extensions and unifications of these distributions are carefully surveyed in works such as Azzalini (2005) and Arellano-Valle and Azzalini (2006). For further information, we refer to the book edited by Genton (2004) and the more recent one by Azzalini and Capitanio (2014). The classic formulation of SN and ST distributions were used successfully in many other works, such as Pyne et al. (2009), Lachos et al. (2010a), Lachos et al. (2010b), Cabral et al. (2012a), Cabral et al. (2012b), and Cabral et al. (2014). Furthermore, another popular version of these two skewed distributions was defined by Sahu et al. (2003), which will be called the SDB-SN and SDB-ST in this work. This proposal was applied in works such as Lin (2009), Lin (2010), and Jara et al. (2008).
Recently, Lee and McLachlan (2016) proposed finite mixtures of a generalization of the classic and SDB-ST distributions, called canonical fundamental skew-t distributions (CFUST), which are special cases of the fundamental skew distributions defined by Arellano-Valle and Genton (2005). Using simulation studies, Lee and McLachlan (2016) showed that the CFUST distribution outperforms the classic and SDB-ST distributions in mixture models. In this regard, this work aims to extend the classic ST-LMM by considering the CFUST distribution used in Lee and McLachlan (2016), by developing ML estimation based on an EM-type algorithm. This formulation, which will be hereafter simply called ST distribution, enables a more flexible skewness structure at the cost of a higher number of parameters to be estimated.
The rest of this manuscript is organized as follows. Section 2 introduces the formulation of the skewed distributions that are considered in this work. Section 3 defines the ST-LMM and discusses its ML estimation via an EM-type algorithm. Section 4 presents some simulation studies conducted to evaluate the empirical performance of the ST-LMM and the effect of initial values, as well as an illustrative study to exemplify the flexibility of the model. In Section 5, the methodology is applied to a schizophrenia data set. Finally, Section 6 discusses some final remarks.
2 Skewed distributions
Let denote a
-dimensional random vector following a normal distribution with mean vectorand covariance matrix , and let and
be the respective density and cumulative distribution function. Whenand (the null -dimensional vector and the identity matrix, respectively), we simplify the notation to and , and when , we use the notation and .
Suppose , then for a given Borel set , we say that the distribution of is a truncated normal distribution on , denoted by , whose density is given by
where is the indicator function of . As a particular case, consider and , then all elements of the vector
are independent random variables and. Now, considering , we say that the distribution of is a -dimensional half-normal with scale matrix , where .
First, we will define the version of the skew-normal distribution that will be used in this work. It is a special case of the fundamental skew-normal distribution defined by Arellano-Valle and Genton (2005). The presentation below is based on this work, and all the proofs can be found there. Some of them are reproduced below, and some others are skipped.
Let and be independent, where is positive definite, and let be a matrix. We say that the distribution of
is skew-normal with location vector , shape matrix , and scale matrix . We use the notation .
Observe that when , then . A trivial but relevant consequence of this definition is that affine transformations of SN distributions are still SN, as stated in Proposition 1.
Let be an matrix with rank , be a vector of length , and . Then, .
In particular, marginal distributions are also SN. Thus, if and considering the partition
then, for , we have . Matrix induces similar partitions on , and , given by , , , , where , and . By Proposition 1, we have . An analogous result is true for .
Let . Then, the density of is given by
Define the random vector
where and are given in Definition 1. Then, has the same distribution of . Thus, it is enough to find the distribution of , which is . It can be shown that
which implies . The result follows immediately. ∎
Observe that the relation induces a 1-1 parameterization, that is, we could parameterize the distribution in terms of and use the notation with density given in (2). Besides this, if we define the parameterization , where is a square root of , the pdf of is
where . This parameterization is used in Arellano-Valle and Genton (2005)
– see equation (2.11) and page 109 for the pdf and moment generating function. Unless stated explicitly, we use the parameterization.
The mean vector and covariance matrix of a random vector with SN distribution are given in Proposition 3. The proof is a direct consequence of Definition 1 and the fact that if a random variable has a univariate half-normal distribution with scale parameter 1, then its mean value is . We define the -dimensional vector of ones by .
Let . Then,
Before defining the skew-t distribution, we enunciate a result regarding marginal and conditional distributions of a random vector with Student-t distribution that will be helpful to obtain results for the skew-t distribution similar to the ones presented to the SN distribution. The proof can be found in Arellano-Valle and Bolfarine (1995) (see also Fang et al. (1990, Theorem 3.8)).
Let and consider the partition given in (1) and the induced partitions of and . It can be shown that:
where and , with and .
In what follows, and respectively denote the density and the cumulative distribution of the -variate Student-t distribution with location vector , scale matrix and degrees of freedom.
If , then has density function
where and are given in (3) and .
Let be a matrix, be a vector of length , and . Then, .
If , we have . An analogous result can be shown for .
Equation (6) also implies the following result:
Let and . Then, admits the following hierarchical representation
where denotes the -dimensional half-normal distribution with location parameter and scale matrix .
It is straightforward to show that the density of in Proposition 6 is
Let . Then,
where , and is a matrix of ones.
3 The model
The standard linear mixed model introduced by Laird and Ware (1982) has been a widely used tool to model the correlation within-subjects often present in longitudinal data. In this work, we present a flexible extension of this model.
Suppose that there are subjects, with the th subject having observations, then a linear mixed model can be written as
We consider that
where , and denotes independent random vectors. This setup implies
where denotes independent and identically distributed random vectors. Thus, we have that , according to (7). Moreover, from Proposition 6, we have the following stochastic representation of the complete data model
Observe that , with . Using Proposition 5, we have
with Hence, the marginal pdf of is
where , , , and . Therefore, assuming that , and depend on unknown and reduced parameter vectors , and , respectively, the log-likelihood function for based on the observed sample is given by
where . Since the observed log-likelihood function involves complex expressions, it is very computationally expensive to work directly with to find the ML estimates of . Hence, in the following subsection, we discuss the development of an EM-type algorithm (Dempster et al., 1977) for ML estimation.
3.2 Maximum likelihood estimation
From the hierarchical representation given in (10)–(13) and treating , and as hypothetical missing data, we propose to use the ECME algorithm (Liu and Rubin, 1994) for parameter estimation. Let the augmented data set be , where , then the complete-data log-likelihood function is given by
where is a function that depends on the parameter vector only through .
Given the current value , the E-step of an EM-type algorithm evaluates , where the expectation is taken with respect to the joint conditional distribution of , , and , given and . Therefore, we can write
and , with indicating the trace of matrix A, and , , , , , and , .
where , , and and are as given in (15). Moreover,
with denoting a Student-t distribution truncated on
, and its moments can be computed using theR package MomTrunc (Galarza et al., 2021).
To maximize with respect to , the ECME algorithm performs a conditional maximization (CM) step that conditionally maximizes , obtaining a new estimate , as follows:
1. , , , and are updated using the following expressions: