DeepAI

Canonical fundamental skew-t linear mixed models

In clinical trials, studies often present longitudinal data or clustered data. These studies are commonly analyzed using linear mixed models (LMMs), usually considering Gaussian assumptions for random effect and error terms. Recently, several proposals extended the restrictive assumptions from traditional LMM by more flexible ones that can accommodate skewness and heavy-tails and consequently are more robust to outliers. This work proposes a canonical fundamental skew-t linear mixed model (ST-LMM), that allows for asymmetric and heavy-tailed random effects and errors and includes several important cases as special cases, which are presented and considered for model selection. For this robust and flexible model, we present an efficient EM-type algorithm for parameter estimation via maximum likelihood, implemented in a closed form by exploring the hierarchical representation of the ST-LMM. In addition, the estimation of standard errors and random effects is discussed. The methodology is illustrated through an application to schizophrenia data and some simulation studies.

• 5 publications
• 8 publications
• 1 publication
02/03/2020

Scale mixture of skew-normal linear mixed models with within-subject serial dependence

In longitudinal studies, repeated measures are collected over time and h...
11/05/2019

A Conway-Maxwell-Multinomial Distribution for Flexible Modeling of Clustered Categorical Data

Categorical data are often observed as counts resulting from a fixed num...
07/29/2020

Approximate inferences for nonlinear mixed effects models with scale mixtures of skew-normal distributions

Nonlinear mixed effects models have received a great deal of attention i...
06/14/2020

Heckman selection-t model: parameter estimation via the EM-algorithm

Heckman selection model is perhaps the most popular econometric model in...
06/11/2019

A mnotone data augmentation algorithm for longitudinal data analysis via multivariate skew-t, skew-normal or t distributions

The mixed effects model for repeated measures (MMRM) has been widely use...
09/21/2021

A Bayesian hierarchical model for disease mapping that accounts for scaling and heavy-tailed latent effects

In disease mapping, the relative risk of a disease is commonly estimated...
07/02/2020

A robust nonlinear mixed-effects model for COVID-19 deaths data

The analysis of complex longitudinal data such as COVID-19 deaths is cha...

1 Introduction

Linear mixed models (LMM) are commonly used to model data that present a natural hierarchical structure because they flexibly model the within-subject correlation (Pinheiro and Bates, 2000). This kind of structure appears when a variable of interest is repeatedly measured for several subjects (or cluster units, in general), which is frequently the case in clinical trials.

For mathematical convenience, it is usually assumed that both random effect and error follow Gaussian distributions. Nevertheless, these restrictive assumptions may result in a lack of robustness against departures from the normal distribution and invalid statistical inferences, especially when the data show heavy tails and skewness

(see, e.g., Drikvandi et al., 2017; Drikvandi, 2019). Several approaches have been considered in the literature to replace the normal assumptions of LMM with more flexible distributions. For example, Pinheiro et al. (2001) proposed a multivariate t linear mixed model (T-LMM), and Rosa et al. (2003) considered the thick-tailed class of normal/independent (NI) distributions from a Bayesian framework.

Accounting for skewness, Arellano-Valle et al. (2005) proposed a skew-normal linear mixed model (SN-LMM) based on the classic skew-normal (SN) distribution introduced by Azzalini and Dalla Valle (1996), and Ho and Lin (2010) proposed a skew-t linear mixed model (ST-LMM) based on the classic skew-t (ST) distribution introduced by Azzalini and Capitanio (2003). More generally, Lachos et al. (2010a)

proposed robust parametric modeling of LMM based on skew-normal/independent (SNI) distributions, where random effects follow an SNI distribution and within-subject errors follow a normal/independent (NI) distribution, and consequently observed responses follow an SNI distribution, and they define what they call the skew-normal/independent linear mixed model, and

Schumacher et al. (2021) extended the SNI-LMM by considering within-subject serial dependence and developing additional tools for model evaluation.

In general, SN and ST distributions extend the normal and Student-t distributions by introducing additional parameters regulating skewness. Some extensions and unifications of these distributions are carefully surveyed in works such as Azzalini (2005) and Arellano-Valle and Azzalini (2006). For further information, we refer to the book edited by Genton (2004) and the more recent one by Azzalini and Capitanio (2014). The classic formulation of SN and ST distributions were used successfully in many other works, such as Pyne et al. (2009), Lachos et al. (2010a), Lachos et al. (2010b), Cabral et al. (2012a), Cabral et al. (2012b), and Cabral et al. (2014). Furthermore, another popular version of these two skewed distributions was defined by Sahu et al. (2003), which will be called the SDB-SN and SDB-ST in this work. This proposal was applied in works such as Lin (2009), Lin (2010), and Jara et al. (2008).

Recently, Lee and McLachlan (2016) proposed finite mixtures of a generalization of the classic and SDB-ST distributions, called canonical fundamental skew-t distributions (CFUST), which are special cases of the fundamental skew distributions defined by Arellano-Valle and Genton (2005). Using simulation studies, Lee and McLachlan (2016) showed that the CFUST distribution outperforms the classic and SDB-ST distributions in mixture models. In this regard, this work aims to extend the classic ST-LMM by considering the CFUST distribution used in Lee and McLachlan (2016), by developing ML estimation based on an EM-type algorithm. This formulation, which will be hereafter simply called ST distribution, enables a more flexible skewness structure at the cost of a higher number of parameters to be estimated.

The rest of this manuscript is organized as follows. Section 2 introduces the formulation of the skewed distributions that are considered in this work. Section 3 defines the ST-LMM and discusses its ML estimation via an EM-type algorithm. Section 4 presents some simulation studies conducted to evaluate the empirical performance of the ST-LMM and the effect of initial values, as well as an illustrative study to exemplify the flexibility of the model. In Section 5, the methodology is applied to a schizophrenia data set. Finally, Section 6 discusses some final remarks.

2 Skewed distributions

Let denote a

-dimensional random vector following a normal distribution with mean vector

and covariance matrix , and let and

be the respective density and cumulative distribution function. When

and (the null -dimensional vector and the identity matrix, respectively), we simplify the notation to and , and when , we use the notation and .

Suppose , then for a given Borel set , we say that the distribution of is a truncated normal distribution on , denoted by , whose density is given by

 fY(y)=1P(X∈A)ϕp(y∣μ,Σ)1A(y),

where is the indicator function of . As a particular case, consider and , then all elements of the vector

are independent random variables and

. Now, considering , we say that the distribution of is a -dimensional half-normal with scale matrix , where .

First, we will define the version of the skew-normal distribution that will be used in this work. It is a special case of the fundamental skew-normal distribution defined by Arellano-Valle and Genton (2005). The presentation below is based on this work, and all the proofs can be found there. Some of them are reproduced below, and some others are skipped.

Definition 1.

Let and be independent, where is positive definite, and let be a matrix. We say that the distribution of

 Y=Δ|X0|+X1

is skew-normal with location vector , shape matrix , and scale matrix . We use the notation .

Observe that when , then . A trivial but relevant consequence of this definition is that affine transformations of SN distributions are still SN, as stated in Proposition 1.

Proposition 1.

Let be an matrix with rank , be a vector of length , and . Then, .

In particular, marginal distributions are also SN. Thus, if and considering the partition

 Y=(Y⊤1,Y⊤2)⊤,% whereY1:p1×1andY2:p2×1,withp1+p2=p, (1)

then, for , we have . Matrix induces similar partitions on , and , given by , , , , where , and . By Proposition 1, we have . An analogous result is true for .

Proposition 2.

Let . Then, the density of   is given by

 fY(y)=2qϕp(y∣μ,Σ)Φq(Δ⊤Σ−1(y−μ)∣0,Λ), (2)

where

 Σ=Ω+ΔΔ⊤andΛ=Iq−Δ⊤Σ−1Δ. (3)
Proof.

Define the random vector

 T=ΔX0+X1, (4)

where and are given in Definition 1. Then, has the same distribution of . Thus, it is enough to find the distribution of , which is . It can be shown that

 (X0T)∼Nq+p[(0μ),(IΔ⊤ΔΩ+ΔΔ⊤)], (5)

which implies . The result follows immediately. ∎

Remarks

1. Observe that the skew-normal distribution of Sahu et al. (2003) is a particular case of the definition above when and is a diagonal matrix. Moreover, when is univariate, we have the classic skew-normal distribution used, for example, in Lachos et al. (2010a) and Cabral et al. (2012b).

2. Observe that the relation induces a 1-1 parameterization, that is, we could parameterize the distribution in terms of and use the notation with density given in (2). Besides this, if we define the parameterization , where is a square root of , the pdf of is

 fY(y)=2qϕp(y∣μ,Σ)Φq(Δ∗⊤Σ−1/2(y−μ)∣0,Λ),

where . This parameterization is used in Arellano-Valle and Genton (2005)

– see equation (2.11) and page 109 for the pdf and moment generating function. Unless stated explicitly, we use the parameterization

.

The mean vector and covariance matrix of a random vector with SN distribution are given in Proposition 3. The proof is a direct consequence of Definition 1 and the fact that if a random variable has a univariate half-normal distribution with scale parameter 1, then its mean value is . We define the -dimensional vector of ones by .

Proposition 3.

Let . Then,

 E(Y)=μ+√2πΔ1qandVar(Y)=Σ−2πΔΔ⊤.

Before defining the skew-t distribution, we enunciate a result regarding marginal and conditional distributions of a random vector with Student-t distribution that will be helpful to obtain results for the skew-t distribution similar to the ones presented to the SN distribution. The proof can be found in Arellano-Valle and Bolfarine (1995) (see also Fang et al. (1990, Theorem  3.8)).

Lemma 1.

Let and consider the partition given in (1) and the induced partitions of and . It can be shown that:

• ,

• ,

where   and   , with  and  .

Definition 2.

Let and be independent, where

denotes a Gamma distribution with mean

and variance

, with . Let be a vector of constants of length . We say that the distribution of is skew-t (ST) with location vector , scale matrix , shape matrix , and degrees of freedom. We use the notation .

In what follows, and respectively denote the density and the cumulative distribution of the -variate Student-t distribution with location vector , scale matrix and degrees of freedom.

Proposition 4.

If , then has density function

 fY(y)=2qtp(y∣μ,Σ,ν)Tq⎛⎝Δ⊤% Σ−1(y−μ)√ν+pν+d(y)∣0,Λ,ν+p⎞⎠,

where and are given in (3) and .

Proof.

By definitions 1 and 2, we have that

 Y=μ+U−1/2(Δ|X0|+X1), (6)

where and are independent. Let and . Then,

 (VW)∼tq+p[(00),(I0⊤0Ω),ν].

Observe that the distribution of is the same as , where , and we have

Thus, the density of is . Since , , and, by Lemma 1, we know that , which concludes the proof. ∎

Similar to the SN case, by Definition 2 and Proposition 1, it can be shown that affine transformations of ST distributions are still ST distributed, as stated in the following proposition.

Proposition 5.

Let be a matrix, be a vector of length , and . Then, .

If , we have . An analogous result can be shown for .

Equation (6) also implies the following result:

Proposition 6.

Let and . Then, admits the following hierarchical representation

 Y|S=s,U=u ∼{N}p(μ+Δs,u−1Ω); S|U=u ∼HNq(0,u−1Iq); U ∼Gamma(ν/2,ν/2),

where denotes the -dimensional half-normal distribution with location parameter and scale matrix .

It is straightforward to show that the density of in Proposition 6 is

 fS|U(s|u)=2q(2π)−q/2uq/2exp(−u2s⊤s),s>0.
Proposition 7.

Let . Then,

 E(Y)=μ+√2/πκ1Δ1qandVar(Y)=νν−2(Σ−2πΔΔ⊤)+a(ν)ΔJqΔ⊤, (7)

where , and is a matrix of ones.

To prove this result, notice that by (6), the distribution of is . Thus, by Proposition 3, we have that which implies . We have that , which can be easily obtained since has a Gamma distribution, and the first part of the result follows. Similarly, the variance can be easily derived.

3 The model

3.1 Definition

The standard linear mixed model introduced by Laird and Ware (1982) has been a widely used tool to model the correlation within-subjects often present in longitudinal data. In this work, we present a flexible extension of this model.

Suppose that there are subjects, with the th subject having observations, then a linear mixed model can be written as

 Yi=Xiβ+Zibi+ϵi,i=1,…,N, (8)

We consider that

 (biϵi)ind∼STq+ni,r[(bΔ1r0ni×1),(D0q×ni0ni×qΩi),(Δ0ni×r),ν], (9)

where , and denotes independent random vectors. This setup implies

 biiid∼STq,r(bΔ1r,D,Δ,ν),ϵiind∼tni(0,Ωi,ν),i=1,…,N,

where denotes independent and identically distributed random vectors. Thus, we have that , according to (7). Moreover, from Proposition 6, we have the following stochastic representation of the complete data model

 Yi|bi,Ui=ui ∼Nni(Xiβ+Zibi,u−1iΩi); (10) bi|Si=si,Ui=ui (11) Si|Ui=ui ∼HNr(0,u−1iIr); (12) Ui ∼Gamma(ν/2,ν/2),i=1,…,N. (13)

Observe that , with . Using Proposition 5, we have

 Yi∼STni,r(Xiβ+bZiΔ1r,Ψi,ZiΔ,ν), (14)

with Hence, the marginal pdf of is

 f(yi)= 2rtni(yi∣μi,Σi,ν)× (15) Tr⎛⎝Δ⊤Z⊤iΣ−1i(yi−μi)√ν+niν+di(yi)∣∣ ∣∣0,Λi,ν+ni⎞⎠,i=1,…,N,

where , , , and . Therefore, assuming that , and depend on unknown and reduced parameter vectors , and , respectively, the log-likelihood function for based on the observed sample is given by

 ℓ(θ|y)=N∑i=1ℓi(θ|y)=N∑i=1logf(yi|θ), (16)

where . Since the observed log-likelihood function involves complex expressions, it is very computationally expensive to work directly with to find the ML estimates of . Hence, in the following subsection, we discuss the development of an EM-type algorithm (Dempster et al., 1977) for ML estimation.

3.2 Maximum likelihood estimation

From the hierarchical representation given in (10)–(13) and treating , and as hypothetical missing data, we propose to use the ECME algorithm (Liu and Rubin, 1994) for parameter estimation. Let the augmented data set be , where , then the complete-data log-likelihood function is given by

 ℓc(θ|yc) = N∑i=1[−12log|Ωi|−ui2(yi−Xiβ−Zibi)⊤Ω−1i(yi−Xiβ−Zibi) −12log|D|−ui2(bi−Δ(b1r+si))⊤D−1(bi−Δ(b1r+si))]+K(ν|u,s),

where is a function that depends on the parameter vector only through .

Given the current value , the E-step of an EM-type algorithm evaluates , where the expectation is taken with respect to the joint conditional distribution of , , and , given and . Therefore, we can write

 ˆQ(k)i(θ)=ˆQ(k)1i(β,ϕ)+ˆQ(k)2i(α,δ)+ˆQ(k)3i(ν),

where

 ˆQ(k)1i(β,ϕ) = −12log|Ωi|−ˆu(k)i2(yi−Xiβ)⊤Ω−1i(yi−Xiβ) ˆQ(k)2i(α,δ) = −12log|D|−12tr% (D−1ˆub2(k)i)+bˆub(k)⊤iD−1Δ1r+tr(D−1Δˆubs(k)⊤i) −bˆus(k)⊤iD−1Δ1r−12tr(Δ% ⊤D−1Δˆus2(k)i)−ˆu(k)i2b21⊤rΔ⊤D−1Δ1r,

and , with indicating the trace of matrix A, and , , , , , and , .

From the representation given in (10)–(13), using properties from conditional expectation and after some algebra, omitting the supra-index , the expressions above can be written as:

 ˆui=ˆν+niˆν+ˆdi(yi)Tr(ˆqi(yi)√(ˆν+ni+2)/(ˆν+ˆdi(yi))∣∣0,ˆΛi,ˆν+ni+2)Tr(ˆqi(yi)√(ˆν+ni)/(ˆν+ˆdi(yi))∣∣0,ˆΛi,ˆν+ni),ˆusi=ˆuiE{Wi∣ˆ%$θ$,yi},ˆus2i=ˆuiE{WiW⊤i∣ˆθ,yi},ˆubi=ˆriˆui+ˆMiˆD−1ˆΔˆusi,ˆubsi=ˆriˆus⊤i+ˆMiˆD−1ˆΔˆus2i,ˆub2i=ˆMi+ˆubiˆr⊤i+ˆubsiˆ%$Δ$⊤ˆD−1ˆMi,

where , , and and are as given in (15). Moreover,

 Wi∣θ,yi∼TTr(qi(yi),ν+di(yi)ν+ni+2Λi,ν+ni+2;R+), (17)

with denoting a Student-t distribution truncated on

, and its moments can be computed using the

R package MomTrunc (Galarza et al., 2021).

To maximize with respect to , the ECME algorithm performs a conditional maximization (CM) step that conditionally maximizes , obtaining a new estimate , as follows:

1. , , , and are updated using the following expressions:

 ˆβ(k+1) = (N∑i=1ˆu(k)iX⊤iˆΩ−1i(ˆϕ(k))Xi)−1N∑i=1X⊤iˆ%\$Ω<