varTestnlme: Variance Components Testing in Linear and Nonlinear Mixed-effects Models

07/09/2020
by   Charlotte Baey, et al.
University of Lille
0

The issue of variance components testing arises naturally when building mixed-effects models, to decide which effects should be modeled as fixed or random. While tests for fixed effects are available in R for models fitted with lme4, tools are missing when it comes to random effects. The varTestnlme package for R aims at filling this gap. It allows to test whether any subset of the variances and covariances are equal to zero using likelihood ratio tests. It also offers the possibility to test simultaneously for fixed effects and variance components. It can be used for linear, generalized linear or nonlinear mixed-effects models fitted via lme4, nlme or saemix. Theoretical properties of the used likelihood ratio test are recalled and examples based on different real datasets using different mixed models are provided.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

12/22/2017

Likelihood ratio test for variance components in nonlinear mixed effects models

Mixed effects models are widely used to describe heterogeneity in a popu...
06/07/2019

An Approximate Restricted Likelihood Ratio Test for Variance Components in Generalized Linear Mixed Models

Generalized linear mixed models (GLMMs) are used to model responses from...
08/30/2022

Empirical Likelihood Inference of Variance Components in Linear Mixed-Effects Models

Linear mixed-effects models are widely used in analyzing repeated measur...
01/17/2019

Score-based Tests for Explaining Upper-Level Heterogeneity in Linear Mixed Models

Cross-level interactions among fixed effects in linear mixed models (als...
12/09/2018

Bootstrapping F test for testing Random Effects in Linear Mixed Models

Recently Hui et al. (2018) use F tests for testing a subset of random ef...
04/16/2019

Constraints in Random Effects Age-Period-Cohort Models

Random effects (RE) models have been widely used to study the contextual...
07/13/2019

Inference for high-dimensional linear mixed-effects models: A quasi-likelihood approach

Linear mixed-effects models are widely used in analyzing clustered or re...

1 Introduction

Mixed-effects models are widely used in many fields of applications, ranging from agronomy, social science, medical science, or biology. One can distinguish three main types of mixed models: linear mixed models (LMMs), generalized linear mixed models (GLMMs) and nonlinear mixed models (NLMMs). Several packages and softwares are available to fit these types of model, for example the SAS procedures MIXED and NLMIXED for linear, generalized linear and nonlinear mixed models, or the dedicated softwares NONMEN and Monolix for nonlinear mixed models, with an emphasis on pharmacodynamic and pharmacokinetic models for the former. In R, three main packages are available to deal with these models: lme4

(Bates et al., 2015), nlme (Pinheiro et al., 2019) and saemix (Comets et al., 2017)

. If they can all deal with the three types of mixed models, the way they handle nonlinear mixed models differs. Both lme4 and nlme use an approximation of the likelihood function based on a linearization of the model, while saemix uses an EM-type algorithm to derive the maximum likelihood estimate, without resorting to any approximation or linearization of the model.

When building a mixed-effects model, one is facing the delicate choice of which effects should be modeled as random. This issue has been studied both from theoretical and practical point of views.

On the theoretical side, two main approaches have been explored. The first one follows the idea of variable selection. Chen and Dunson (2003) focused on linear mixed models and proposed a Bayesian formulation using mixture priors with a point mass at zero for the random effects. More recently, Ibrahim et al. (2011); Groll and Tutz (2014) used a penalized likelihood approach to simultaneously select fixed and random effects. Selection criteria adapted to the context of mixed-effects models were also suggested by Vaida and Blanchard (2005); Gurka (2006) and Delattre et al. (2014). The second approach concerns hypothesis testing based on the variance components, with the seminal work of Stram and Lee (1994, 1995) in linear mixed models (see also the review by Molenberghs and Verbeke (2007)). Recently, Baey et al. (2019) have exhibited the exact limiting distribution of the likelihood ratio test (LRT) statistic, for testing that the variances of any subset of the random effects are equal to zero.

On the practical side, several of the aforementioned methods have been implemented e.g. in R. Among others, we can cite for example the glmmLasso (Groll, 2017) package, computing an -penalized maximum likelihood estimator in the context of generalized linear mixed models, or the cAIC4 (Saefken and Ruegamer, 2018) package, implementing the conditional AIC criterion. As far as hypothesis testing is concerned, most of the available tools are designed for linear or generalized linear mixed-effects models, and are mostly adapted to outputs from lme4 package. The lmerTest package (Kuznetsova et al., 2017) provides enhanced versions of functions anova() and summary() of the lme4 package by computing -values for the corresponding tests for fixed effects. It also implements both the Satterthwaite’s and the Kenward-Roger’s approximations to correct -values in unbalanced or small sample size situations. It also provides a step() function, which simplify the random effects structure of the model using a step-down approach. Each step is based on the computation of -values from the LRT for testing that the variance of each random effect is equal to zero. However, the asymptotic distribution used to compute these

-values is a chi-square one, with the degree of freedom computed as the difference between the number of parameters under the alternative hypothesis and the number of parameters under the null hypothesis. In the context of mean components testing in linear and generalized linear models, the pbkrtest package

(Halekoh and Højsgaard, 2014) provides two alternatives to the LRT in small sample size situations, based either on the Kenward-Roger approximation or on parametric bootstrap. These two packages can only be used with models fitted with the lme4 package. To the best of our knowledge, there is no package implementing the LRT for variance components in all types of mixed-effects models.

In this paper, we present the varTestnlme package dedicated to test that the variances of any subset of the random effects are equal to zero in LMMs, GLMMs and NLMMs, using models that were fitted using either lme4, nlme or saemix. Moreover it is possible to test simultaneously mean and variance components. The varTestnlme package takes as inputs the two competitor models, fitted with the same package, and returns the -value based on the asymptotic distribution of the LRT statistic associated with the two nested models corresponding to the null and alternative hypotheses.

Mixed-effects models and the LRT procedure are recalled in Section 2. The implementation of the method is described in Section 3. Section 4 is devoted to illustrations and examples of how to use the package. Finally, current limitations and possible extensions are discussed in Section 5.

2 Mixed-effects models and likelihood ratio test procedure

2.1 Description of the mixed-effects models

We consider the following nonlinear mixed-effects model (Davidian and Giltinan (1995) p98, Pinheiro and Bates (2000) p306, Lavielle (2014) p24):

(1)

where

denotes the vector of observations of the

-th individual of size , , a nonlinear function, the vector of individual parameters of individual , the vector of covariates, and the random error term. The vectors of individual parameters are modeled as:

(2)

where is the vector of fixed effects taking values in , and are covariates matrices of individual of sizes and respectively, is the centered vector of Gaussian random effects with covariance matrix of size . The random vectors are assumed to be independent. The vectors are assumed to be independent and identically distributed centered Gaussian vectors with covariance matrix . Finally the sequences and are assumed mutually independent.

Let denotes the parameters vector of the model taking value in the set where denotes the set of positive definite matrices of size .

2.2 Testing variance components

We consider general test hypotheses of the following form, to test for the nullity of variances among :

(3)

where . Up to permutations of rows and columns of the covariance matrix , we can assume that we are testing the nullity of the last variances. We write in blocks as follows:

with a matrix, a matrix, a matrix and where denotes the transposition of matrix , for any matrix .

Thus we have:

The likelihood ratio test (LRT) statistic is then defined by:

where is the likelihood function.

As shown in Baey et al. (2019), the limiting distribution of the LRT statistic associated with the two hypotheses and defined in (3) is:

(4)

with is the Fisher Information Matrix, is the tangent cone to the space at point , and where denotes the chi-bar-square distribution parametrized by the positive definite matrix and by the cone . The chi-bar-square distribution is a mixture of chi-bar distributions, where the degrees of freedom and the weights involved in the mixture depend on the matrix and on the cone . More details about the computation of the parameters of the chi-bar square distribution are given in Section 3.

2.3 Testing simultaneously fixed effects and variance components

Results of the previous section can be easily extended to the case where one is testing simultaneously that a subset of the mean parameters and a subset of the covariance parameters are null. In this case, the null hypothesis and subsequently the parameter space are slightly modified as follows: if we denote by the number of fixed effects which are tested equal to 0, then we need to replace “” by “” in the definition of (see (7)).

3 Implementation

3.1 Parameters of the chi-bar-square distribution

In Baey et al. (2019)

, it is highlighted that the limiting distribution of the likelihood ratio test statistic depends on the structure of

and its expression is exhibited in two common cases, when is either diagonal or full. In the general case, can always be written in the following block-diagonal form:

(5)

where , and for all , is a positive semi-definite matrix of size with . In the sequel, we assume that the blocks are full, i.e. all the covariances inside these blocks are non-null. This is how the covariance matrix of random effects is implemented in package lme4 (Bates et al., 2015, p. 7, section 2.2). It the appendix of this book, it is mentioned that more general structures for the covariance matrix

could be implemented using modularized functions of the lme4 package. Nevertheless these functionalities are not treated in the varTestnlme package at the moment. In nlme, more sophisticated covariance structures can be used, and in saemix, no specific restriction is imposed on the covariance matrix structure. However in varTestnlme we restrict ourselves to the block-diagonal case with full blocks. This restriction allows for a simple block-diagonal structure of the covariance matrix and eases the computation.

In this context of a block-diagonal covariance matrix , three cases can arise when testing that a subset of the variances are null: i) we are testing that blocks among the blocks are null, ii) we are testing that sub-blocks of the blocks are null, or iii) a mixture of i) and ii). Let us denote by the number of blocks into which covariances are tested equal to 0 without testing that the corresponding variances are equal to 0 (only testing non-diagonal elements), by the number of blocks which are tested entirely equal to 0, and by the number of blocks where sub-blocks are tested equal to 0. By sub-block, we mean that we are testing a sub-matrix which is strictly smaller than the block matrix from which it was extracted. We assume that , , and .

Without loss of generality, and up to a permutation of rows and columns of , we can assume that the blocks are in the following order: first, the blocks which are not tested, then the blocks where only covariances are tested, next the blocks which are tested entirely equal to zero, and finally the blocks where diagonal sub-blocks are tested equal to zero. Then, the null and alternative hypotheses can be written as:

(6)

where

(7)

From the general expressions of and in (7), we can derive the expressions of and using the results of Hiriart-Urruty and Malick (2012), and consequently the expression of the closed convex cone , involved in the chi-bar-square distribution in (4).

We obtain:

(8)

where

(9)

To identify the components of the chi-square mixture, i.e. the degrees of freedom and the weights of all the chi-square distributions involved in the mixture, we can use the properties enounced by

Shapiro (1985), stating that if contains a linear space of dimension , the first weights of the mixture are null, and if is included in a linear space of dimension , the last weights of the mixture are null. The chi-bar-square distribution is then a mixture of chi-square distributions with degrees of freedom varying between and .

According to the general formulation of the cone given in (8), we have:

(10)

In particular, if only blocks of variances are tested (i.e. if , and ), then and there is a Dirac mass at 0 in the mixture.

From equations (9) and (10), we can see that the number of elements in the chi-bar-square mixture only depends on the number of variances being tested, and on the structure of the covariance matrix. When fixed effects are tested simultaneously to a set of variance components, the number of elements in the mixture is the same as in the case where only the set of variance components is tested, but the degrees of freedom of each element of the chi-square mixture is shifted upward by .

As an example, let us consider a model with 3 random effects, with , , with and positive definite covariance matrices. Let us consider the following hypotheses:

(11)

where

Since no fixed effect is tested, we have . We can consider as a block-diagonal matrix with only block. In this block of size , we are testing that a sub-block of size is equal to 0. Using the above notations, we have and , and , and . We thus have: , and . The limiting distribution of the LRT associated to the two above hypotheses is then a mixture of chi-square distributions with degrees of freedom 2, 3, 4 and 5.

3.2 Computation of the chi-bar-square weights

The chi-bar-square weights are in general non explicit, and have to be estimated via Monte Carlo techniques. The general idea is to simulate i.i.d. realizations

from the limiting chi-bar-square distribution using its definition as the norm of the projection of a multivariate Gaussian random variable on a closed convex cone. More precisely, let

be a closed convex cone of , a positive definite matrix of size and . Then, the random variable defined below follows a chi-bar-square distribution with parameters and as detailed in Silvapulle and Sen (2011):

(12)

In varTestnlme, when the weights of the chi-bar-square distribution are not available in a closed form, they are estimated according to the procedure proposed by Silvapulle and Sen (2011). More details are given in Algorithm 1.

1:Define an estimate of the inverse of the Fisher information matrix under , and let be a sequence of non-negative increasing numbers.
2:for  do
  1. simulate

  2. compute

    using quadratic programming when is the non-negative orthant , for some , and using general nonlinear optimization tools otherwise. In varTestnlme, we use respectively the quadprog and the alabama packages.

3:end for
4:Compute matrix of size :
where

is the cumulative distribution function of a chi-square distribution with

degrees of freedom. N.B. The expression of given above corresponds to the case where is even. In the case where

is odd, the last column has a ‘1’ in the second row, instead of a ‘0’.

5:Compute vector of size :
6:Estimate the weights of the chi-bar-square distribution by solving the system:
7:Estimate the covariance matrix of the weights by:
Algorithm 1 Estimation of the weights of the chi-bar-square distribution using Monte Carlo simulations

3.3 Computation of the -value

Let be the cumulative distribution function (cdf) of the chi-square distribution wih degrees of freedom. The -value of the test can then be estimated in two different ways, both of them being computed in the varTestnlme:

(13)
(14)

where is the estimated weight associated with the chi-square distribution with degrees of freedom, and where are simulated according to the limiting chi-bar-square distribution (see Algorithm 1 for more details on the notations).

Note that for very small values of the real -value, a very large sample size would be needed in order to get a non-zero estimate .

3.4 Bounds on -values

Since the simulation of can be time consuming, and since for , it is possible to compute bounds on the -value of the test. Indeed, since the sum of all the weights is equal to 1, and the sum of even (resp. odd) weights is equal to 1/2, natural bounds are given for the -value by Silvapulle and Sen (2011):

(15)

Even if those bounds can be crude in some cases, they can turn out to be quite useful in practice, and might bring enough information to support or reject the null hypothesis.

By default, package varTestnlme returns these bounds, and if more precision is wanted by the user, it is possible to re-run the analysis in order to estimate the weights of the limiting distribution.

4 Illustrations

The varTestnlme package provides a unified framework for likelihood ratio tests of fixed and random parts of a linear, generalized linear or nonlinear mixed-effects model fitted either with the nlme, lme4 or saemix packages. The main function varTest takes, in its simplest form, two arguments: m1 the fitted model under the alternative and m0 the fitted model under the null .

4.1 Data

For illustrative purposes, three datasets will be used in the paper, each of them covering one of the three types of models that can be treated with the varTestnlme package. These datasets are all available in the nlme package.

4.1.1 Orthodontal data

The first dataset comes from a study on dental growth (Potthoff and Roy, 1964), where the distance between the pituitary and the pterygomaxillary fissure was recorded every two years from the age of 8 to the age of 14, on 27 children, 16 boys and 11 girls (see Figure 1).

Figure 1: Data from the orthodontal study: distance between the pituitary and the pterygomaxillary fissure as a function of age, by subject.

R> data("Orthodont", package = "nlme")

These data can be fitted using a linear mixed-effects model, with a random slope and a random intercept. Let us denote by , , the dental measurement of child of sex at age . Then the model can be written as:

(16)
(17)

We denote by the vector of fixed effects.

4.1.2 Bovine pleuropneumonia

The second dataset comes from a study on contagious bovine pleuropneumonia (cbpp) (Lesnoff et al., 2004), where the number of new serological cases occuring during a given time period was recorded at 4 occasions, on 15 herds (see Figure 2).

Figure 2: Data from the cbpp study: number of new cases per period, by herd.

R> data("cbpp", package = "nlme")

This data can be fitted using a generalized linear mixed-effects model and a logistic regression. Let us denote by

the number of serological cases in herd and at time period . If we denote by

the logit function, then the model is given by:

(18)
(19)

4.1.3 Loblolly pine trees

The third dataset comes from a study on Loblolly pine trees (Kung, 1986), where the growth of 14 trees was recorded between 3 and 25 years of age (see Figure 3).

Figure 3: Data from the loblolly study: height as a function of age, by tree.

R> data("Loblolly", package = "datasets")

This data can be fitted using a nonlinear mixed-effects model. Let us denote by the height of the -th tree at age , by the asymptote for the -th tree, the logarithm of the growth rate for the -th tree and the height of tree at age 0. We consider the following model:

(20)
(21)

4.2 Preliminary step: fitting the models under and

The first step of the analysis consists in specifying and , the two hypotheses defining the test. Then, one needs to fit the model under both hypotheses, using one of the following three packages: nlme, lme4 and saemix. The varTestnlme package automatically detects the package that was used to fit the models, as well as the structure of the models under and .

Note that both models m1 and m0 should be fitted with the same package, except when there is no random effects in m0 (see details below).

4.2.1 Linear model

We will consider three hypothesis testing configurations for the linear model presented in (16). We detail below the null and alternative hypotheses in each case, and the code to fit the associated models.

  1. Case 1: testing that there is a random slope, in a model where the slope and the intercept are correlated:

    with

    The syntax using lme4 and nlme packages is: enumerate

    R> lm1.h1.lme4 <- lmer(distance   1 + Sex + age + age*Sex + (1 + age | Subject), + data = Orthodont, REML = FALSE) R> lm1.h0.lme4 <- lmer(distance   1 + Sex + age + age*Sex + (1 | Subject), + data = Orthodont, REML = FALSE)

    R> lm1.h1.nlme <- lme(distance   1 + Sex + age + age*Sex, + random =   1 + age | Subject, data = Orthodont, method = "ML") R> lm1.h0.nlme <- lme(distance   1 + Sex + age + age*Sex, + random =   1 | Subject, data = Orthodont, method = "ML")

    enumerate

  2. Case 2: testing that there is a random slope, in a model where the slope and the intercept are independent:

    with

    enumerate

    R> lm2.h1.lme4 <- lmer(distance   1 + Sex + age + age*Sex + (1 + age || Subject), + data = Orthodont, REML = FALSE) R> lm2.h0.lme4 <- lm1.h0.lme4

    R> lm2.h1.nlme <- lme(distance   1 + Sex + age + age*Sex, + random = list(Subject = pdDiag( 1+age)), data = Orthodont, method = "ML") R> lm2.h0.nlme <- lm1.h0.nlme

    enumerate

  3. Case 3: testing for the presence of random effects, in a model where the slope and the intercept are independent:

    with

R> lm3.h1.lme4 <- lm2.h1.lme4 R> lm3.h1.nlme <- lm2.h1.nlme R> lm3.h0 <- lm(distance   1 + Sex + age + age*Sex, data = Orthodont)

4.2.2 Generalized linear model

In the model considered in (18), there is only one random effect. We will then consider the following test:

with

The corresponding code using lme4 is: R> glm1 <- glmer(cbind(incidence, size - incidence)   period + (1 | herd), family = binomial, data = cbpp) R> glm0 <- glm(cbind(incidence, size - incidence)   period, family = binomial, data = cbpp)

4.2.3 Nonlinear model

Let us consider the model nonlinear model defined in (20). Here we will carry out the following test:

with

i.e. that only the asymptote is random.

The corresponding code using nlme and lme4 is: R> start <- c(Asym = 103, R0 = -8.5, lrc = -3.2) R> nlm1.nlme <- nlme(height   SSasymp(age, Asym, R0, lrc), fixed = Asym + R0 + lrc   1, random = pdDiag(Asym + R0 + lrc   1), start = start, data = Loblolly) R > nlm0.nlme <- nlme(height   SSasymp(age, Asym, R0, lrc), fixed = Asym + R0 + lrc   1, random = pdDiag(Asym   1), start = start, data = Loblolly)

R > nlm1.lme4 <- nlmer(height   SSasymp(age, Asym, R0, lrc)   (Asym + R0 + lrc || Seed), start = start, data = Loblolly) R > nlm0.lme4 <- nlmer(height   SSasymp(age, Asym, R0, lrc)   (Asym | Seed), start = start, data = Loblolly)

4.3 Variance component testing

Once the models have been fitted using one of the three packages lme4, nlme or saemix, we can use the varTest() function of the varTestnlme package to test the chosen null hypothesis. The syntax of the function is the following: varTest(m1,m0,control = list(M=5000, parallel=F, nbcores=1), fim = "extract", pval.comp = "bounds") where:

  • m1 is the model fitted under , i.e. an object of class merMod, glmerMod or nlmerMod for models fitted via lme4, of class lme or nlme for models fitted via nlme, or of class SaemixObject for models fitted via saemix,

  • m0 is the model fitted under using the same package as for m1, except when no random effects are present under (when testing for the absence of random effects). In this case, m0 should be fitted using lm() for linear, glm() for generalized linear, and nls() for nonlinear models (these functions are available in the stats package),

  • control is a list with: M, the size of the Monte Carlo sample for the computation of the weights of the chi-bar-square distribution (5000 by default, see Algorithm 1); parallel a boolean to specify whether the Monte Carlo sampling should be done in parallel (FALSE by default); nbcores the number of cores to be used with parallel computing (1 by default); and B the size of the bootstrap sample used to estimate the Fisher Information Matrix,

  • fim could be either "extract" (the default) if the Fisher Information Matrix should be extracted from the fitted model m1; "compute" if it should be computed using parametric bootstrap (note that this functionality does not work for models fitted with the saemix package); or FIM a user-defined matrix to be used as the Fisher Information Matrix,

  • pval.comp specifies the way to compute the -value, and could be either "bounds" (the default), in which case only bounds on the true -value are computed, "approx", in which case a Monte Carlo estimation of the exact -value is provided, or "both" for a combination of both approaches. In the case where the weights are known explicitly, no approximation is made and the exact weights are return.

Note that the approximation can be time consuming, especially when control$M is high, or when the number of mixtures in the chi-bar-square distribution increases. It is recommended to first run the function using the default setting, i.e. with pval.comp="bounds", and to possibly re-run it with pval.comp="approx" if more precision is needed.

4.3.1 Linear model

We illustrate the function on the three different tests defined in the previous section for the linear model on the orthodontal data.

Case 1: testing that the variance of age is null, in a model with correlated random effects. We first run the function varTest() with the default arguments. In this case, only bounds on the -value are computed.

R> varTest(lm1.h1.lme4, lm1.h0.lme4) Variance components testing in mixed-effects models (models fitted using the lme4 package)

Testing that the variance of age is null model under H1: distance   1 + Sex + age + age * Sex + (1 + age | Subject) model under H0: distance   1 + Sex + age + age * Sex + (1 | Subject)

Likelihood ratio test statistic: LRT = 0.83264

Limiting distribution: mixture of 2 chi-bar-square distributions with degrees of freedom 1, 2

lower-bound for p-value: 0.51049 upper bound for p-value: 0.51049

The package that was used to fit the model is recognized, and the names of the random effects being tested is identified. Formulas of the two models m1 and m0 are recalled. The LRT statistic is computed, and the asymptotic distribution is identified as a mixture between two chi-square distributions with degrees of freedom 1 and 2. In this case, due to the property of the chi-bar-square distribution (namely that the sum of the weights associated to odd degrees of freedom is equal to 1/2, and similarly that the sum of the weights associated to even degrees of freedom is also equal to 1/2), we can compute the exact -value. This is why the lower and upper bounds are equal.

If we re-run the function, with the option pval.comp="both", we get the same results, but with additional information about the weights and

-value. This time, we run the function with fits from nlme to show that results are similar if not identical (differences can appear in the LRT statistic due to the way the log-likelihood is computed within each package). Additional information include the weights of each chi-square distribution in the mixture, and their associated standard deviations. Here, since the weights have simple analytical expressions, the associated standard deviations are null.

R> varTest(lm1.h1.nlme, lm1.h0.nlme, pval.comp="both") Variance components testing in mixed-effects models (models fitted using the nlme package)

Testing that the variance of age is null

model under H1: distance   1 + Sex + age + age * Sex (fixed effects), pdSymm(Subject   1 + age) (random effects) model under H0: distance   1 + Sex + age + age * Sex (fixed effects),   1 | Subject (random effects)

Likelihood ratio test statistic: LRT = 0.83311

Limiting distribution: mixture of 2 chi-bar-square distributions with degrees of freedom 1, 2

associated weights and sd: 0.5 (0), 0.5 (0)

p-value (from estimated weights): 0.51035 lower-bound for p-value: 0.51035 upper bound for p-value: 0.51035

Case 2: testing that the variance of age is null in a model with independent random effects. The number of components in the mixture is the same as in Case 1, but the degrees of freedom have been shifted downward. Note that the weights have also simple analytical expressions in this case, as well as the -value. Results are the same with the nlme package.

R> varTest(lm2.h1.lme4, lm2.h0.lme4, pval.comp="both") Variance components testing in mixed-effects models (models fitted using the lme4 package)

Testing that the variance of age is null

model under H1: distance   1 + Sex + age + age * Sex + ((1 | Subject) + (0 + age | Subject)) model under H0: distance   1 + Sex + age + age * Sex + (1 | Subject)

Likelihood ratio test statistic: LRT = 0.53041

Limiting distribution: mixture of 2 chi-bar-square distributions with degrees of freedom 0, 1

lower-bound for p-value: 0.23322 upper bound for p-value: 0.23322

Case 3: testing the presence of randomness in the model, i.e. testing that the variances of age and of the intercept are null, in a model with independent random effects.

R> vt3 <- varTest(lm3.h1.nlme,lm3.h0) R> summary(vt3) Testing that the variances of Intercept and age are null

Likelihood ratio test statistic: LRT = 50.133

Limiting distribution: mixture of 3 chi-bar-square distributions with degrees of freedom 0, 1, 2

lower-bound for p-value: 7.1831e-13 upper bound for p-value: 7.2152e-12

In this case, using the provided upper bound we see that the exact -value is smaller than , which is enough in practice to reject the null hypothesis without computing the chi-bar-square weights and the associated -value. It is also possible to compute the weights using the option pval.comp="both". In this case, the weights are explicit but depend on the Fisher Information Matrix (FIM). Since we are dealing with a linear mixed-effects model, the FIM can be extracted from nlme or lme4 packages, using the option fim="extract". This is the default behaviour of varTest function. Note that results can differ between packages since the methods used to compute the FIM is not the same in nlme and lme4.

Using nlme package:

R> vt3ext <- varTest(lm3.h1.nlme, pval.comp="both") R> summary(vt3ext) Testing that the variances of Intercept and age are null

Likelihood ratio test statistic: LRT = 50.133

Limiting distribution: mixture of 3 chi-bar-square distributions with degrees of freedom 0, 1, 2 associated weights and sd: 0.377 (0), 0.5 (0), 0.123 (0)

p-value (from estimated weights): 2.3226e-12

Using lme4 package:

R> vt3ext.lme4 <- varTest(lm3.h1.lme4,lm3.h0.lme4, pval.comp="both") R> summary(vt3ext.lme4) Testing that the variances of Intercept and age are null

Likelihood ratio test statistic: LRT = 50.133

Limiting distribution: mixture of 3 chi-bar-square distributions with degrees of freedom 0, 1, 2 associated weights and sd: 0.25 (0), 0.5 (0), 0.25 (0)

p-value (from estimated weights): 3.9655e-12

To compute the FIM using parametric bootstrap, one should use the option fim="compute". The default bootstrap sample size is , but can be changed using the control argument. When the FIM is computed via bootstrap, results are more consistent between nlme and lme4.

Using nlme package: R> varTest(lm3.h1.nlme,lm3.h0,pval.comp="both", fim="compute") Testing that the variances of Intercept and age are null

Likelihood ratio test statistic: LRT = 50.133

Limiting distribution: mixture of 3 chi-bar-square distributions with degrees of freedom 0, 1, 2 associated weights and sd: 0.357 (0), 0.5 (0), 0.143 (0)

p-value (from estimated weights): 2.5715e-12

Using lme4 package:

R> vt3comp <- varTest(lm3.h1.lme4,lm3.h0,pval.comp="both",fim="compute") R> summary(vt3comp) Testing that the variances of Intercept and age are null

Likelihood ratio test statistic: LRT = 50.133

Limiting distribution: mixture of 3 chi-bar-square distributions with degrees of freedom 0, 1, 2 associated weights and sd: 0.363 (0), 0.5 (0), 0.137 (0)

p-value (from estimated weights): 2.4987e-12

4.3.2 Generalized linear model

Results from the test of the null hypothesis that there is no random effect associated with the herd are given by:

R> varTest(m1,m0) Variance components testing in mixed-effects models (models fitted using the lme4 package)

Testing that the variance of (Intercept) is null

model under H1: cbind(incidence, size - incidence)   period + (1 | herd) model under H0: cbind(incidence, size - incidence)   period

Likelihood ratio test statistic: LRT = 14.005

Limiting distribution: mixture of 2 chi-bar-square distributions with degrees of freedom 0, 1

lower-bound for p-value: 9.115e-05 upper bound for p-value: 9.115e-05

4.3.3 Nonlinear model

As detailed in Section 3.2, the limiting distribution of the LRT for the test of the null hypothesis that only the asymptote is random against the alternative hypothesis of a full covariance matrix between the random effects is a mixture of 4 chi-square distributions with degrees of freedom varying between 2 and 5. When run with the default options, the varTest function gives:

R > varTest(nlm1.nlme,nlm0.nlme) Variance components testing in mixed-effects models (models fitted using the nlme package)

Testing that the variances of R0 and lrc are null

model under H1: height   SSasymp(age, Asym, R0, lrc) (nonlinear model), pdDiag(Asym + R0 + lrc   1) (random effects) model under H1: height   SSasymp(age, Asym, R0, lrc) (nonlinear model), pdDiag(Asym   1) (random effects)

Likelihood ratio test statistic: LRT = 2.5199

Limiting distribution: mixture of 3 chi-bar-square distributions with degrees of freedom 0, 1, 2

lower-bound for p-value: 0.05621 upper bound for p-value: 0.19805

R > varTest(nlm1.lme4,nlm0.lme4) Variance components testing in mixed-effects models (models fitted using the lme4 package)

Testing that the variances of R0 and lrc are null

model under H1: height   SSasymp(age, Asym, R0, lrc)   (0 + Asym + R0 + lrc || Seed) model under H0: height   SSasymp(age, Asym, R0, lrc)   (0 + Asym | Seed)

Likelihood ratio test statistic: LRT = 2.4567

Limiting distribution: mixture of 3 chi-bar-square distributions with degrees of freedom 0, 1, 2

lower-bound for p-value: 0.058514 upper bound for p-value: 0.2049

Depending on the fixed level of the test, more precision may be needed on the -value. For models fitted with nlme, this can be obtained using the options pval.comp = ’both’ or pval.comp = ’approx’ and fim = ’extract’ to use the FIM computed by nlme, or fim = ’compute’ to approximate the FIM using parametric bootstrap. In the latter case, one should specify the bootstrap sample size in the control argument.

R > varTest(nlm1.nlme, nlm0.nlme, pval.comp = "both", fim = "compute") Variance components testing in mixed-effects models (models fitted using the nlme package)

Testing that the variances of R0 and lrc are null

model under H1: height   SSasymp(age, Asym, R0, lrc) (nonlinear model), pdDiag(Asym + R0 + lrc   1) (random effects) model under H1: height   SSasymp(age, Asym, R0, lrc) (nonlinear model), pdDiag(Asym   1) (random effects)

Computing Fisher Information Matrix by bootstrap…

…generating the B=1000 bootstrap samples …

|……………………………………………………………. …………………………………………| 100

Likelihood ratio test statistic: LRT = 2.5199

Limiting distribution: mixture of 3 chi-bar-square distributions with degrees of freedom 0, 1, 2

associated weights and sd: 0.253 (0), 0.5 (0), 0.247 (0)

p-value (from estimated weights): 0.12614 lower-bound for p-value: 0.05621 upper bound for p-value: 0.19805

Here again, the weights are exact, once the FIM is known, so that the standard deviations for each weight is 0. For information purposes, the above code took approximatively 176 seconds on a laptop with a 8 cores Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz processor.

For nonlinear models fitted with lme4, to the best of our knowledge no method is available to extract the FIM (although merDeriv provides the FIM for linear and generalized linear models fitted via lme4). Only the option fim=’compute’ can then be used. We get the following results, which are very similar to the ones obtained with the nlme package. However the code was a bit longer to run, and took 400 seconds.

R > varTest(nlm1.lme4, nlm0.lme4, pval.comp = "both", fim = "compute") Variance components testing in mixed-effects models (models fitted using the lme4 package)

Testing that the variances of R0 and lrc are null

model under H1: height   SSasymp(age, Asym, R0, lrc)   (0 + Asym + R0 + lrc || Seed) model under H0: height   SSasymp(age, Asym, R0, lrc)   (0 + Asym | Seed) Computing Fisher Information Matrix by bootstrap…

…generating the B=1000 bootstrap samples …

|……………………………………………………………… ……………………………………….| 100Likelihood ratio test statistic: LRT = 2.4567

Limiting distribution: mixture of 3 chi-bar-square distributions with degrees of freedom 0, 1, 2

associated weights and sd: 0.262 (0), 0.5 (0), 0.238 (0)

p-value (from estimated weights): 0.12833 lower-bound for p-value: 0.058514 upper bound for p-value: 0.2049

5 Summary and discussion

In this paper, we present the R package varTestnlme, which provides a unified framework for variance components testing in linear, generalized linear and nonlinear mixed-effects models fitted with nlme, lme4 and saemix. The main function, varTest, performs the comparison of two nested models m1 and m0. It is thus possible to test that the variances of any subset of random effects is null, taking into account correlation between them. It is also possible to test that one or several correlations are null, and to test simultaneously if some fixed effects are null. The tests are performed using likelihood ratio test. Additional tools are provided to approximate the Fisher Information Matrix when it is needed to compute the limiting distribution of the likelihood ratio test statistic.

Possible future developments of the package include in particular the consideration of more than one level of random effects, to account for multiple nested random effects. Another perspective would be to allow for more general structures for the covariance matrix of the random effects. This would encompass for example parametric structures such as compound symmetry, auto-regressive or with spatial correlation, as proposed in the nlme package.

References

  • C. Baey, P. Cournède, and E. Kuhn (2019) Asymptotic distribution of likelihood ratio test statistics for variance components in nonlinear mixed effects models. Computational Statistics and Data Analysis 135, pp. 107–122. Cited by: §1, §2.2, §3.1.
  • D. Bates, M. Mächler, B. Bolker, and S. Walker (2015) Fitting linear mixed-effects models using lme4. Journal of Statistical Software 67 (1), pp. 1–48. External Links: Document, ISSN 1548-7660 Cited by: §1, §3.1.
  • Z. Chen and D. B. Dunson (2003) Random effects selection in linear mixed models. Biometrics 59 (4), pp. 762–769. Cited by: §1.
  • E. Comets, A. Lavenu, and M. Lavielle (2017) Parameter estimation in nonlinear mixed effect models using saemix, an r implementation of the saem algorithm. Journal of Statistical Software 80 (3), pp. 1–41. External Links: Document Cited by: §1.
  • M. Davidian and D. M. Giltinan (1995) Nonlinear models for repeated measurement data. Chapman Hall. Cited by: §2.1.
  • M. Delattre, M. Lavielle, and M. Poursat (2014) A note on bic in mixed-effects models. Electronic Journal of Statistics 8 (1), pp. 456–475. Cited by: §1.
  • A. Groll and G. Tutz (2014) Variable selection for generalized linear mixed models by -penalized estimation. Statistics and Computing 24, pp. 137–154. Cited by: §1.
  • A. Groll (2017) GlmmLasso: variable selection for generalized linear mixed models by -penalized estimation. Note: R package version 1.5.1 External Links: Link Cited by: §1.
  • M. J. Gurka (2006) Selecting the best linear mixed model under reml. The American Statistician 60 (1), pp. 19–26. Cited by: §1.
  • U. Halekoh and S. Højsgaard (2014) A kenward-roger approximation and parametric bootstrap methods for tests in linear mixed models – the R package pbkrtest. Journal of Statistical Software 59 (9), pp. 1–30. External Links: Link Cited by: §1.
  • J. B. Hiriart-Urruty and J. Malick (2012) A Fresh Variational-Analysis Look at the Positive Semidefinite Matrices World. Journal of Optimization Theory and Applications 153 (3), pp. 551–577. Cited by: §3.1.
  • J. G. Ibrahim, H. Zhu, R. I. Garcia, and R. Guo (2011) Fixed and random effects selection in mixed effects models. Biometrics 67 (2), pp. 495–503. Cited by: §1.
  • F. Kung (1986) Fitting logistic growth curve with predetermined carrying capacity. In ASA Proceedings of the Statistical Computing Section, pp. 340–343. Cited by: §4.1.3.
  • A. Kuznetsova, P. Brockhoff, and R. Christensen (2017) LmerTest package: tests in linear mixed effects models. Journal of Statistical Software 82 (13), pp. 1–26. Cited by: §1.
  • M. Lavielle (2014) Mixed effects models for the population approach: models, tasks, methods and tools. CRC Press. Cited by: §2.1.
  • M. Lesnoff, G. Laval, P. Bonnet, S. Abdicho, A. Workalemahu, D. Kifle, A. Peyraud, R. Lancelot, and F. Thiaucourt (2004) Within-herd spread of contagious bovine pleuropneumonia in ethiopian highlands. Preventive Veterinary Medicine 64 (1), pp. 27–40. Cited by: §4.1.2.
  • G. Molenberghs and G. Verbeke (2007) Likelihood Ratio, Score, and Wald Tests in a Constrained Parameter Space. The American Statistician 61 (1), pp. 22–27. Cited by: §1.
  • J.C. Pinheiro and D.M. Bates (2000) Mixed-Effects Models in S and S-PLUS. Springer. Cited by: §2.1.
  • J. Pinheiro, D. Bates, S. DebRoy, D. Sarkar, and R Core Team (2019) nlme: linear and nonlinear mixed effects models. Note: R package version 3.1-139 External Links: Link Cited by: §1.
  • R. F. Potthoff and S. Roy (1964)

    A generalized multivariate analysis of variance model useful especially for growth curve problems

    .
    Biometrika 51 (3-4), pp. 313–326. Cited by: §4.1.1.
  • B. Saefken and D. Ruegamer (2018) CAIC4: conditional akaike information criterion for lme4. Note: R package version 0.3 Cited by: §1.
  • A. Shapiro (1985) Asymptotic distribution of test statistics in the analysis of moment structures under inequality constraints. Biometrika 72, pp. 133–144. Cited by: §3.1.
  • M. J. Silvapulle and P. K. Sen (2011) Constrained statistical inference: order, inequality, and shape constraints. Vol. 912, John Wiley & Sons. Cited by: §3.2, §3.2, §3.4.
  • D. O. Stram and J. W. Lee (1994) Variance components testing in the longitudinal mixed effects model.. Biometrics 50 (4), pp. 1171–1177. Cited by: §1.
  • D. O. Stram and J. W. Lee (1995) Corrections to Variance components testing in the longitudinal mixed effects model.. Biometrics 51 (3), pp. 1196. Cited by: §1.
  • F. Vaida and S. Blanchard (2005) Conditional Akaike information for mixed-effects models. Biometrika 92 (2), pp. 351–370. External Links: Document Cited by: §1.