# Scale mixture of skew-normal linear mixed models with within-subject serial dependence

In longitudinal studies, repeated measures are collected over time and hence they tend to be serially correlated. In this paper we consider an extension of skew-normal/independent linear mixed models introduced by Lachos et al. (2010), where the error term has a dependence structure, such as damped exponential correlation or autoregressive correlation of order p. The proposed model provides flexibility in capturing the effects of skewness and heavy tails simultaneously when continuous repeated measures are serially correlated. For this robust model, we present an efficient EM-type algorithm for computation of maximum likelihood estimation of parameters and the observed information matrix is derived analytically to account for standard errors. The methodology is illustrated through an application to schizophrenia data and several simulation studies. The proposed algorithm and methods are implemented in the new R package skewlmm.

## Authors

• 5 publications
• 7 publications
• 9 publications
• ### Approximate inferences for nonlinear mixed effects models with scale mixtures of skew-normal distributions

Nonlinear mixed effects models have received a great deal of attention i...
07/29/2020 ∙ by Fernanda L. Schumacher, et al. ∙ 0

• ### Canonical fundamental skew-t linear mixed models

In clinical trials, studies often present longitudinal data or clustered...
09/24/2021 ∙ by Fernanda L. Schumacher, et al. ∙ 0

• ### Poisson-Tweedie mixed-effects model: a flexible approach for the analysis of longitudinal RNA-seq data

We present a new modelling approach for longitudinal count data that is ...
04/23/2020 ∙ by Mirko Signorelli, et al. ∙ 0

• ### Linear Mixed-Effects Models for Non-Gaussian Repeated Measurement Data

We consider the analysis of continuous repeated measurement outcomes tha...
04/07/2018 ∙ by Özgür Asar, et al. ∙ 0

• ### The correlation structure of mixed effects models with crossed random effects in controlled experiments

The design of experiments in psychology can often be summarized to parti...
03/26/2019 ∙ by Jaromil Frossard, et al. ∙ 0

• ### Partially Specified Spatial Autoregressive Model with Artificial Neural Network

Spatial autoregressive model, introduced by Clif and Ord in 1970s has be...
01/24/2018 ∙ by Wenqian Wang, et al. ∙ 0

• ### Inference of Microbial Interactions Using Copula Models with Mixture Margins

Quantification of microbial interactions from 16S rRNA and meta-genomic ...
11/03/2021 ∙ by Rebecca A. Deek, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Linear mixed models (LMM) are frequently used to analyze repeated measures data, because they model flexibly the within-subject correlation often present in this type of data. Usually for mathematical convenience, it is assumed that both random effect and error term follow normal distributions (N-LMM). These restrictive assumptions, however, may result in a lack of robustness against departures from the normal distribution and invalid statistical inferences, especially when the data show heavy-tails and skewness. For instance, substantial bias in the maximum likelihood estimates of regression parameters can result when the random-effects distribution is misspecified (Drikvandi et al., 2017).

To deal with this problem, some proposals have been made in the literature by replacing the assumption of normality by a more flexible class of distributions. For instance, Pinheiro et al. (2001)

proposed a multivariate t linear mixed model (T-LMM) and showed that it performed well in the presence of outliers.

Rosa et al. (2003) adopted a Bayesian framework to carry out posterior analysis in LMM with the thick-tailed class of normal/independent (NI-LMM) distributions. Arellano-Valle et al. (2005) proposed a skew-normal linear mixed model (SN-LMM) based on the skew-normal (SN) distribution introduced by Azzalini and Valle (1996). Ho and Lin (2010) proposed a skew-t linear mixed model (ST-LMM) based on the skew-t (ST) distribution introduced by Azzalini and Capitanio (2003).

More recently, Lachos et al. (2010) proposes a parametric robust modeling of LMM based on skew-normal/independent (SNI) distributions, where random effects follow a SNI distribution and within-subject errors follow a NI distribution, so that observed responses follow a SNI distribution, and they define what they call the skew-normal/independent linear mixed model (SNI-LMM). They present an efficient EM-type algorithm for the computation of maximum likelihood (ML) estimates of parameters on the basis of the hierarchical formulation of the SNI class. It is important to note that the SNI class is a subclass of the scale mixture of skew-normal (SMSN) class introduced by Branco and Dey (2001), which will be considered in this paper.

A common feature of these classes of LMMs is that the error terms are conditionally independent. However, in longitudinal studies, repeated measures are collected over time and hence the error term tends to be serially correlated. Even though there are some proposals in the literature to deal with the problem of skewness and heavy tails in LMMs, to the best of our knowledge, there are no studies in the SMSN-LMM with serially correlated error structures, such as damped exponential correlation (DEC, Muñoz et al., 1992) or autoregressive correlation of order (AR(), Box et al., 2015). Therefore, the aim of this paper is to develop a full likelihood approach to SMSN-LMM with serially correlated errors, considering some useful correlation structures. Our proposal intends to develop additional tools not considered in Lachos et al. (2010) and to apply these techniques for making robust inferences in practical longitudinal data analysis.

The rest of the paper is organized as follows. Section 2 gives a brief introduction to SMSN class, further we define the SMSN-LMM and presenting some important dependency structures. A likelihood approach for parameter estimation is given in Section 3, including the estimation of random effects and standard errors. Section 4 presents some simulation studies that were conducted to evaluate the empirical performance of the proposed model under several scenarios and in Section 5 we fit the SMSN-LMM to a schizophrenia data set. Finally, Section 6 presents some concluding remarks.

## 2 Model formulation

### 2.1 Scale mixture of skew-normal distributions

Let be a

random vector,

a location vetor, a positive definite dispersion matrix, a skewness parameter, and let

be a positive random variable with a cdf

, where is a scalar or parameter vector indexing the distribution of . The multivariate SMSN class of distributions, denoted by

, can be defined through the following probability density function (pdf):

 f(y)=2∫∞0ϕp(y;\boldmathμ,κ(u)\boldmathΣ) Φ(κ(u)−1/2\boldmathλ⊤\boldmathΣ−1/2(y−% \boldmathμ))dH(u;\boldmathν),y∈Rp, (1)

for some positive weight function , where denotes the pdf of the -variate normal distribution with a mean vector and a covariance matrix ,   is such that , and

denotes the cumulative distribution function (cdf) of the standard normal distribution.

A special case of the SMSN class is the skew–normal distribution (Azzalini and Valle, 1996), denoted by , for which is degenerate at (that is, with probability ), leading to the usual pdf

 f(y)=2ϕp(y;\boldmathμ,\boldmathΣ)Φ(A),y∈Rp,

where . Another special case is obtained when the skewness parameter , then the SMSN distribution in (1) reduces to the SMN distribution (), discussed earlier by Lange and Sinsheimer (1993).

An important feature of this class, that can be used to derive many of its properties, is its stochastic representation. Let be a –dimensional random vector with pdf as in (1), then can be represented in a stochastic way as follows:

 Yd=\boldmathμ+κ(U)1/2% \boldmathΣ1/2(\boldmathδ|T0|+(Ip−%\boldmath$δ$\boldmathδ⊤)1/2T1),with\boldmathδ=\boldmathλ√1+\boldmathλ⊤\boldmathλ, (2)

where means “equal in distribution”, denotes the absolute value of , , , and are all independent variables, with being the identity matrix. The representation in (2) facilitates the implementation of EM-type algorithm. In this representation, it is straightforward that .

For simplicity and following Lachos et al. (2010), in the remaining of this work we restrict to the case where . The asymmetrical class of SMSN distributions includes many distributions as special cases, and we consider explicitly the following distributions:

• The multivariate skew–t distribution with degrees of freedom, (Branco and Dey, 2001; Azzalini and Genton, 2008), which can be derived from the mixture model (1) by taking with and whose pdf can be written as

 f(y)=2tp(y;\boldmathμ,\boldmathΣ,ν)T(√ν+pν+dA;ν+p),y∈Rp,

where and denote, respectively, the pdf of the -variate Student-t distribution and the cdf of the standard univariate Student-t distribution with degrees of freedom, and is the Mahalanobis distance.

• The multivariate skew–slash distribution, , that arises by taking , with and , and whose pdf function takes the form

 f(y)=2ν∫10uν−1ϕp(y;\boldmathμ,u−1\boldmathΣ) Φ(u1/2A)du,y∈Rp.
• The multivariate skew–contaminated normal distribution, where which arises when the mixing scale factor

is a discrete random variable taking one of two values and with probability function given by

where and is the indicator function of the set whose value equals one if and zero elsewhere. In this case, the pdf becomes

We refer to Lachos et al. (2010) and Lachos and Labra (2014) for details and additional properties related to this class of distributions.

### 2.2 The SMSN-LMM

Suppose that a variable of interest together with several covariates are repeatedly measured for each of subjects at certain occasions over a period of time. For the th subject, , let be a vector of observed continuous responses. In general, a normal linear mixed effects model is defined as

 Yi=Xi\boldmathβ+Zibi+\boldmathϵi,i=1,…,n, (3)

where of dimension is the design matrix corresponding to the fixed effects, of dimension is a vector of population-averaged regression coefficients called fixed effects, of dimension is the design matrix corresponding to the random effects vector , and of dimension is the vector of random errors. It is assumed that the random effects and the residual components are independent with and . The random effects covariance matrix D may be unstructured or structured, and the error covariance matrix is commonly written as , where can be a known matrix or a structured matrix depending on a vector of parameter, say .

Likewise, the SMSN-LMM can be defined by considering

 (bi\boldmathϵi)ind.∼SMSNq+ni((c% \boldmathΔ0),(D00\boldmathΣi),(\boldmathλ0);H),i=1,…,n, (4)

where , with , , depends on unknown and reduced parameter vector , and we consider , with , , being a structured matrix. Calculating is straightforward and the results for the distributions discussed in Subsection 2.1 are presented in Table 1, where is the pdf of .

Some remarks about the model formulated in (3) and (4) are worth noting:

• From (Lachos et al., 2010, Lemma 1 in Appendix A) it follows that, marginally,

 biiid.∼SMSNq(c\boldmathΔ,D,\boldmathλ;H)% and\boldmathϵiind.∼SMNni(0,σ2eRi;H),i=1,…,n. (5)

Thus the skewness parameter incorporates asymmetry only in the distribution of the random effects (and consequently in the marginal distribution of , which is given below). In addition, the chosen location parameter ensures that , so that , for each and the regression parameter are all comparable. This is important since centering ’s distribution in (Lachos et al., 2010), so that , might lead to biased estimates of , as illustrated in Appendix C.

• Even though for each , and are indexed by the same scale mixing factor – and hence they are not independent in general –, conditional on , we have that and are independent, what can be written as . Since , and are uncorrelated.

• Under the SMSN-LMM at (3) and (4), for , we have marginally

 Yiind.∼SMSNni(Xi\boldmathβ+Zic\boldmathΔ,\boldmathΨi,¯\boldmathλi;H), (6)

where , , with and . Hence, the marginal pdf of is

 f(yi;\boldmathθ)=2∫∞0ϕni(yi;Xi\boldmathβ+Zic% \boldmathΔ,u−1i\boldmathΨi)Φ(u1/2i¯\boldmathλ⊤i\boldmathΨ−1/2i(yi−Xi\boldmathβ−Zic\boldmathΔ))dH(u;\boldmathν). (7)

This result can be shown using arguments from conditional probability and the moment generating function of the multivariate skew-normal distribution, which is given in Appendix

B.

• The SMSN-LMM can be written hierarchically as follows:

 Yi|bi,Ui=ui ind.∼ Nni(Xi\boldmathβ+Zibi,u−1iσ2eRi), (8) bi|Ti=ti,Ui=ui ind.∼ Nq(\boldmathΔti,u−1i%\boldmath$Γ$), (9) Ti|Ui=ui ind.∼ TN(c,u−1i,(c,∞)), and (10) Ui ind.∼ H(⋅;\boldmathν),i=1,…,n, (11)

which are all independent, where , , with and is the square root of D, such that , containing distinct elements, and TN denotes the univariate normal distribution (N) truncated on the interval . The hirarquical representation given in (8)-(11) is useful for the implementation of the EM algorithm as will be seen in the next section.

#### 2.2.1 Within-subject dependence structures

In order to enable some flexibility when modeling the error covariance, we consider essentially three dependence structures: conditionally independent, AR() and DEC, which will be discussed next.

Conditional independence
The most common and simplest approach is to assume that the error terms are conditionally independent (CI). Under this assumption, for each , we have . This situation has been considered by Lachos et al. (2010) in their applications.

In longitudinal studies, however, repeated measures are collected over time and hence the error term tends to be serially correlated. In order to account for the within-subject serial correlation, we consider other two general structures.

Autoregressive dependence of order
Consider at first the case where for each subject a variable of interest is observed regularly over discrete time, times. Then, we propose to model as a structured AR() dependence matrix. Specifically,

 Ri=Ri(\boldmathϕ)=11−ϕ1ρ1−…−ϕpρp[ρ|r−s|], (12)

where and are the theoretical autocorrelations of the process, and thereby they are functions of autoregressive parameters

, and satisfy the Yule–Walker equations

(Box et al., 2015), i.e.,

 ρk=ϕ1ρk−1+…+ϕpρk−p,ρ0=1,k=1,…,p.

In addition, the roots of must lie outside the unit circle to ensure stationarity of the AR() model. Following Barndorff-Nielsen and Schou (1973), the autoregressive process can be reparameterized using a one-to-one, continuous and differentiable transformation in order to simplify the conditions for stationarity. For details on the estimation of the autoregressive coefficients, we refer to Schumacher et al. (2017).

The model formulated in (3) and (4) with error covariance , where is given by (12), , will be denoted AR()-SMSN-LMM.

Damped exponential correlation
More generally, consider now that for each subject a variable of interest is observed at times . Following Muñoz et al. (1992), we propose to structure as a damped exponential correlation (DEC) matrix, as follows:

 Ri=Ri(\boldmathϕ,ti)=[ϕ|tij−tik|ϕ21],0<ϕ1<1,ϕ2>0, (13)

where , for , and . Note that for , reduces to the correlation matrix of a continuous-time autoregressive processes of order 1 (CAR()), hence enables attenuation or acceleration of the exponential decay from a CAR() autocorrelation function, depending on its value. Moreover, describes the autocorrelation between observations such that . More details on DEC structure can be found in Muñoz et al. (1992).

The DEC structure is rather flexible, and some particular cases are worth pointing out:

1. if , then reduces to the compound symmetry correlation structure (CS);

2. if , then reduces to the CAR() correlation structure;

3. if , then generates a decay rate slower than the CAR() structure;

4. if , then generates a decay rate faster than the CAR() structure; and

5. if , then converges to the correlation matrix of a moving-average of order 1 (MA(1)).

The model formulated in (3) and (4) with error covariance , where is given by (13), , will be denoted DEC-SMSN-LMM.

### 2.3 The likelihood function

The marginal pdf of , , is given in (7), with depending on the chosen correlation structure, as described in Subsection 2.2.1. Hence, the log-likelihood function for based on the observed sample is given by

 ℓ(\boldmathθ|y)=n∑i=1ℓi(% \boldmathθ|y)=n∑i=1log(f(yi|% \boldmathθ)),

where . As the observed log-likelihood function involves complex expressions, it is very difficult to work directly with to find the ML estimates of . Thus, in this work we propose to use an EM-type algorithm (Dempster et al., 1977) for parameter estimation via ML.

## 3 Maximum likelihood estimation

### 3.1 The EM algorithm

A convenient feature of the SMSN-LMM is its hierarchical representation, as given in (8)–(11). Following Lachos et al. (2010), , and can be treated as hypothetical missing data and therefore we propose to use the ECME algorithm (Liu and Rubin, 1994) for parameter estimation.

Let the augmented data set be , with , , and . Hence, an EM-type algorithm can be applied to the complete-data log-likelihood function , given by

 ℓc(\boldmathθ|yc) = n∑i=1[−12log|σ2eRi|−ui2σ2e(yi−Xi\boldmath% β−Zibi)⊤R−1i(yi−Xi\boldmathβ−Zibi) −12log|\boldmathΓ|−ui2(bi−\boldmathΔti)⊤\boldmathΓ−1(bi−\boldmathΔti)]+K(\boldmathν)+C,

where is a constant that is independent of the parameter vector and is a function that depends on only through .

For the current value , the E-step of the EM-type algorithm requires the evaluation of , where the expectation is taken with respect to the joint conditional distribution of , and , given and . Thus, we have

 ˆQ(k)i(\boldmathθ)=ˆQ(k)1i(% \boldmathβ,σ2e,\boldmathϕ)+ˆQ(k)2i(\boldmathα,\boldmathλ)+ˆQ(k)3i(\boldmathν),

where

 ˆQ(k)1i(\boldmathβ,σ2e,\boldmathϕ) = −ni2log(ˆσ2(k)e)−12log∣∣∣ˆR(k)i∣∣∣−ˆu(k)i2ˆσ2(k)e(yi−Xiˆ\boldmathβ(k))⊤ˆR−1(k)i(yi−Xiˆ\boldmathβ(k)) +1ˆσ2(k)e(yi−Xiˆ\boldmathβ(k))⊤ˆR−1(k)iZiˆub(k)i−12ˆσ2(k)etr(ˆR−1(k)iZiˆub2(k)iZ⊤i), ˆQ(k)2i(\boldmathα,\boldmathλ) = −12log∣∣∣ˆ\boldmathΓ(k)∣∣∣−12tr(ˆ\boldmathΓ−1(k)ˆub2(k)i)+ˆ\boldmathΔ⊤(k)ˆ\boldmathΓ−1(k)ˆutb(k)i −ˆut2(k)i2ˆ\boldmathΔ⊤(k)ˆ\boldmathΓ−1(k)ˆ% \boldmathΔ(k),

with and indicating trace and determinant of matrix , respectively, , , , , , , and , . These expressions can be readily evaluated once we have the following conditional distributions, which can be derived using arguments from conditional probability:

 bi|ti,ui,yi,\boldmathθ ∼ Nq(siti+ri,u−1iBi), Ti|ui,yi,\boldmathθ ∼ TN(c+μi,u−1iM2i;(c,∞)), (14) Yi|\boldmathθ ∼ SMSNni(Xi\boldmathβ+cZi\boldmathΔ,\boldmathΨi,¯% \boldmathλi;H),

where , , , , , , for .

Thence, after some algebra and omitting the supra-index , we get the following expressions:

 ˆuti=(ˆμi+ˆc)ˆui+ˆMiˆτ1i,ˆut2i=ˆM2i+(ˆμi+ˆc)2ˆui+ˆMi(ˆμi+2ˆc)ˆτ1i,ˆubi=ˆriˆui+ˆsiˆuti,ˆutbi=ˆriˆuti+ˆsiˆut2i,ˆub2i=ˆBi+ˆuiˆriˆr⊤i+ˆuti(ˆsiˆr⊤i+ˆriˆs⊤i)+ˆut2iˆsiˆs⊤i, (15)

where , and the expressions for and , for , and , can be found in Section 2 from Lachos et al. (2010), which can be easily implemented for the ST and SCN distributions, but involve numerical integration for the SS case.

The M-step requires the maximization of with respect to . The motivation for employing an EM-type algorithm is that it can be utilized efficiently to obtain closed-form equations for the M-step. The conditional maximization (CM) step conditionally maximize with respect to , obtaining a new estimate , as follows:

1. Update , , , and using the following expressions:

 ˆ\boldmathβ(k+1) = (n∑i=1ˆu(k)iX⊤iˆ\boldmathΣ−1(k)iXi)−1n∑i=1X⊤iˆ\boldmathΣ−1(k)i(ˆu(k)iyi−Ziˆub(k)i), ˆσ2e(k+1) = −2(yi−Xiˆ% \boldmathβ(k+1))⊤R−1i(ˆ\boldmathϕ(k))Ziˆub(k)i+tr(R−1i(ˆ% \boldmathϕ(k))Ziˆub2(k)iZ⊤i)], ˆ\boldmathϕ(k+1) = \lx@underaccentset\boldmathϕ% argmaxn∑i=1⎛⎝−12log|Ri(\boldmathϕ)|−ˆu(k)i2ˆσ2(k+1)e(yi−Xiˆ\boldmathβ(k+1))⊤R−1i(\boldmathϕ)(yi−Xiˆ\boldmathβ(k+1))