Additive quantile regression for clustered data with an application to children's physical activity

by   Marco Geraci, et al.
University of South Carolina

Additive models are flexible regression tools that handle linear as well as nonlinear terms. The latter are typically modelled via smoothing splines. Additive mixed models extend additive models to include random terms when the data are sampled according to cluster designs (e.g., longitudinal). These models find applications in the study of phenomena like growth, certain disease mechanisms and energy consumption in humans, when repeated measurements are available. In this paper, we propose a novel additive mixed model for quantile regression. Our methods are motivated by an application to physical activity based on a dataset with more than half million accelerometer measurements in children of the UK Millennium Cohort Study. In a simulation study, we assess the proposed methods against existing alternatives.



There are no comments yet.


page 1

page 2

page 3

page 4


Nonlinear quantile mixed models

In regression applications, the presence of nonlinearity and correlation...

Bayesian quantile additive regression trees

Ensemble of regression trees have become popular statistical tools for t...

Poisson-Tweedie mixed-effects model: a flexible approach for the analysis of longitudinal RNA-seq data

We present a new modelling approach for longitudinal count data that is ...

Scalable visualisation methods for modern Generalized Additive Models

In the last two decades the growth of computational resources has made i...

Detecting Faltering Growth in Children via Minimum Random Slopes

A child is considered to have faltered growth when increases in their he...

Longitudinal modeling of age-dependent latent traits with generalized additive latent and mixed models

We present generalized additive latent and mixed models (GALAMMs) for an...

Additive Density-on-Scalar Regression in Bayes Hilbert Spaces with an Application to Gender Economics

Motivated by research on gender identity norms and the distribution of t...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The goal of regression analysis is to model the distribution of an outcome as a function of one or more covariates. Mean regression is used to assess how the outcome changes on average when the covariates change, and often implies that the direction and strength of the statistical associations are the same for all individuals in a population. However, conditionally on their observed characteristics, subjects who rank below or above the average of the outcome distribution may respond differently to the same treatment or exposure. Evidence of heterogeneous effects across the outcome distribution have been found in countless research including the effect of smoking on weight in lighter or heavier infants

(Abrevaya, 2001; Koenker and Hallock, 2001; Geraci, 2016a); the effect of succimer chelation on different levels of cadmium concentrations in children’s blood (Cao et al., 2013); or the effect of sedentary behavior and food prices on different centiles of children’s anthropometric variables (España Romero et al., 2013; Sturm and Datar, 2005). These children may be at higher risks of morbidity and mortality than those who are at the center of the distribution.

By definition, mean effects average out stronger and weaker effects. The averaging may even cancel out symmetric effects of same magnitudes but opposite signs on the tails of the distribution. Quantile regression (QR) (Koenker and Bassett, 1978) is a flexible statistical tool with a vast number of applications that complements mean regression. QR has become a successful analytic method in many fields of science because of its ability to draw inferences about individuals that rank below or above the population conditional mean. The ranking within the conditional distribution of the outcome can be considered as a natural index of individual latent characteristics which cause heterogeneity at the population level (Koenker and Geling, 2001). There is an increasingly wider acknowledgement of the importance of investigating sources of heterogeneity to quantify more accurately costs, benefits, and effectiveness of interventions or medical treatments, whether it be an after-school physical activity program, a health care reform, or a thrombolytic therapy (see, for example, Austin et al., 2005; Beets et al., 2016; Beyerlein, 2014; Ding et al., 2010; Rehkopf, 2012; Wei and Terry, 2015; Winkelmann, 2006). QR is particularly suitable for this purpose as it yields inferences that are valid regardless of the true underlying distribution. Also, quantiles enjoy a number of properties (Gilchrist, 2000)

, including equivariance to monotone transformations and robustness to outliers.

In this paper we are concerned specifically with nonparametric quantile regression functions of continuous response variables when data arise from cluster designs. Our research is motivated by a study on daily and weekly physical activity patterns in school-aged children using high-frequency accelerometer data. In general, temporal (diurnal) trajectories of physical activity are characterized by strongly nonlinear patterns that require some degree of smoothing

(Butte, Ekelund and Westerterp, 2012; Fan et al., 2015; Morris et al., 2006; Sera et al., 2011, 2017). On the other hand, some predictors of interest may simply have linear effects. If, in addition, data are collected longitudinally to examine weekly patterns, then the correlation at the individual level must be taken into account.

We propose novel additive quantile models that include linear terms, nonlinear terms, as well as random-effects terms which account for the clustering. Further, nonlinear terms are modelled nonparametrically using penalized splines and fitted via automatic scatterplot smoothing within a mixed model framework (Ruppert, Wand and Carroll, 2003). In the next section, we briefly describe the data. In Section 3, we describe the state of the art in nonlinear quantile regression and highlight the differences between our proposal and existing, potentially competing approaches. In Section 4, we describe the methods and, briefly, their implementation in the R language (R Core Team, 2014), with further technical details provided in Appendix A. In Section 5, we carry out a simulation study to assess the performance of the proposed methods (with tables reported in Appendix B). The real data analysis is presented in Section 6 while concluding remarks are given in Section 7.

2 Sedentariness and physical activity in UK children

The benefits of regular physical activity on well-being and life expectancy as well as the detrimental health effects of sedentary behavior have been amply documented (e.g., see Ekelund et al., 2016; Katzmarzyk and Pate, 2017; Warburton, Nicol and Bredin, 2006)

. Physical inactivity in England is estimated to cost more than eight billion pounds a year. This includes both the direct costs of treating major, lifestyle-related diseases and the indirect costs of sickness absence

(National Institute for Health and Clinical Excellence, 2008). It is also estimated that 54,000 premature deaths a year are linked to a sedentary lifestyle (Department for Culture, Media and Sport, 2002).

Since establishing an active lifestyle at an early age is an important form of prevention against morbidity and premature mortality, promotion of physical activity in children and young people has gained a strategically central place in the public health agenda (National Institute for Health and Clinical Excellence, 2009). A nationally representative study (Griffiths et al., 2013a) showed that only half of UK seven-year-olds achieve recommended levels of physical activity, with girls far less active than boys. It is therefore important to identify the predictors of physical activity, not only at the average intensity of activity but also (and perhaps especially) at lower and upper intensities.

Accelerometer data collected in seven-year-old children of the Millennium Cohort Study (MCS), a UK-wide longitudinal multi-purpose survey, represent a major, large-scale epidemiological resource to study physical activity determinants (Griffiths et al., 2013b). Accelerometers are devices capable of providing an objective measure of the intensity and duration of movement. They produce an output known as ‘acceleration counts’ which is dimensionless and thus requires calibration in order to be converted into physiologically more relevant units, usually based on energy expended per unit of time (e.g., metabolic equivalent of task). The plot in Figure 1

shows accelerometer counts by time of the day for a subset of MCS children who provided reliable data for 7 days of the week. Since temporal trajectories of activity were similar between Monday and Friday, and during Saturday and Sunday, data in the plot are shown for weekdays (or workdays) and weekend days, respectively. Each dot represents the accelerometer counts measured on any of the weekdays or weekend days for a child at a given time of the day, while the solid lines are piecewise linear curves connecting sample quantiles estimated cross-sectionally (conditional on time). During weekdays there are periods of higher activity levels that mirror traveling times to and from school, and lunch and break times

(Sera et al., 2017)

. However, temporal trajectories at different quantile levels of the conditional distribution are not simply vertical shifts of one another. This suggests that the scale and possibly the shape of the counts distribution change with time of the day. For example, the skewness of the distribution in the weekend is small early in the morning, and steadily increases during the day.

Figure 1: Accelerometer counts (rounded to the nearest 100) observed between 7:00 and 20:00 and aggregated over 10-minute intervals in 1154 English children of the UK Millennium Cohort Study, by days of the week (Monday through Friday, weekdays; Saturday and Sunday, weekend). Solid lines connect sample quantiles that are estimated conditionally on time for 5 quantile levels .

3 Approaches to nonlinear quantile regression

Nonlinear associations occur in many research studies, including bioassays and pharmacokinetic experiments (Lindsey, 2001), as well studies related to growth processes in biology and agriculture (Davidian and Giltinan, 2003). In the presence of nonlinearity, there are different modelling strategies one may consider. Nonparametric models, smoothing splines (including polynomial models), and transformation models are the most commonly adopted. In addition to the specific strategy, one must also take into account the particular analytical framework: for example, is the sampling design cross-sectional or longitudinal? Is the outcome discrete or continuous? Should distributional assumptions be parametric or nonparametric? And so on. The effort needed for the analysis may have, at times, considerable weight on the final decision regarding which approach to follow. Lack of theory or even computer software can move the needle towards one choice over another.

The literature on parametric and nonparametric nonlinear regression models is vast. However, most of the work has been done in relation to nonlinear mean regression. In comparison, substantially less methods have been developed for nonlinear quantile regression and even less for nonlinear quantile regression with clustered data. A brief account of strategies to nonlinear quantile regression modelling is given by Geraci (2016b)

and these consist in: (i) smoothing (nonparametric regression); (ii) nonlinear parametric models; and (iii) transformation models. In the next sections, we discuss methods that fall in the first group and that are of direct relevance to the present study, and briefly survey those in the other two groups for completeness.

3.1 Nonparametric models

Nonparametric regression and, more in general, additive models occupy an important place in statistical modelling. Popular approaches include locally weighted scatterplot smoothing (Cleveland, 1979; Cleveland and Devlin, 1988) and smoothing splines (Wahba, 1990; de Boor, 2001; Hastie and Tibshirani, 1990; Ruppert, Wand and Carroll, 2003). If we consider the literature on nonparametric quantile functions for data with no clustering, there is a considerable variety of proposals. For example, Koenker, Ng and Portnoy (1994), He, Ng and Portnoy (1998), Koenker and Mizera (2004), and Koenker (2011) considered total variation roughness penalties for univariate, bivariate, and additive quantile smoothing splines. For smooth functions, the total variation penalty corresponds to the counterpart of the

smoothing spline penalty and leads to an elegant, computationally convenient optimization problem which can be solved via linear programming

(Koenker and Mizera, 2004). Yu and Jones (1998), Horowitz and Lee (2005), and Spokoiny, Wang and Härdle (2013) proposed estimation methods based on kernel weighted and polynomial local linear fitting. Other, similar local fitting methods are given in Wu, Yu and Yu (2010) and Cai and Xu (2008). In the Bayesian framework for independent data, Thompson et al. (2010) proposed an approach based on natural cubic splines (Green and Silverman, 1994) with a Gaussian smoothness prior.

A few approaches have been proposed also for the estimation of nonparametric quantile functions with repeated measurements or when the data are subject to other forms of dependence. Wei et al. (2006) considered modelling and estimation of longitudinal growth curves. They discussed an additive quantile regression model decomposed into a nonparametric temporal trend, a first-order autoregressive component, and a partially linear component to adjust for other covariates. Fenske et al. (2013) extended Fenske, Kneib and Hothorn’s (2011) method to additive quantile regression based on longitudinal data. In their model, fitted via boosting, they included -penalized fixed cluster-specific intercepts and slopes (thus, no covariance structure) with pre-specified shrinkage parameters, to account for individual deviations from a ‘population’ trend. Additive nonlinear effects were modelled via penalized splines, again, with pre-specified amount of smoothing as controlled by the number of boosting iterations. An additive model is also considered by Yue and Rue (2011)

who proposed normally distributed random intercepts and nonlinear terms with Bayesian P-splines and Gaussian Markov random fields as smoothness priors. Finally, it is worth mentioning that a related, although different, approach involves modelling the quantile regression coefficients in varying coefficient models, such as those by

Andriyana, Gijbels and Verhasselt (2014) and Reich, Fuentes and Dunson for, respectively, longitudinally and spatially correlated data.

The modelling approach we develop in this study differs on several accounts from existing proposals. First of all, we model the intra-cluster correlation by means of random effects instead of autoregressive errors (Wei et al., 2006). Secondarily, in contrast to Yue and Rue (2011), we include cluster-specific random slopes in addition to random intercepts, and, in contrast to Fenske et al. (2013), we allow cluster-specific effects to have a general covariance matrix. In addition, our estimation approach radically differs from that of Fenske et al. (2013) since, as described in Section 4, the optimal degree of shrinkage of the cluster-specific effects and the optimal level of smoothing for the nonlinear terms are automatically estimated from the data.

3.2 Nonlinear parametric models

Contributions to statistical methods for nonlinear mean regression when data are clustered can be found in the literature of mixed-effects modelling (Lindstrom and Bates, 1990; Pinheiro and Bates, 1995, 2000) as well as generalized estimating equations (Davidian and Giltinan, 1995, 2003; Contreras and Ryan, 2000; Vonesh et al., 2002). In contrast, there seem to be only a handful of published articles in the statistical literature on parametric nonlinear quantile regression functions with clustered data. Karlsson (2008) considered nonlinear longitudinal data and proposed weighting the standard quantile regression estimator (Koenker and Bassett, 1978) with pre-specified weights. Wang (2012), taking her cue from Geraci and Bottai (2007), used the AL distribution to define the likelihood of a Bayesian nonlinear quantile regression model. Huang and Chen (2016) proposed a Bayesian joint model for time-to-event and longitudinal data. An approach based on copulas is developed by Chen, Koenker and Xiao (2009). Oberhofer and Haupt (2016) established the consistency of the -norm nonlinear quantile estimator under weak dependency. Finally, Geraci (2017a) extended Geraci and Bottai’s (2014) quantile mixed models to the nonlinear case.

3.3 Transformation models

Traditionally, transformations toward linearity have been developed for conditional mean function estimation (e.g. Aranda-Ordaz, 1981; Box and Cox, 1964), with some proposals to deal with clustered data (see, for example, Gurka et al., 2006; Hutmacher et al., 2011; Maruo et al., 2017). This general approach has been taken by others in quantile function estimation when data are independent using the Box-Cox transformation (Buchinsky, 1995; Chamberlain, 1994; Mu and He, 2007), the Aranda-Ordaz transformation (Dehbi, Cortina-Borja and Geraci, 2016), as well as other new flexible transformations (Geraci and Jones, 2015). In particular, Mu and Wei (2009) developed a Box-Cox quantile regression model with varying coefficients for longitudinal data.

4 Methods

4.1 Notation

We consider data from two-level nested designs in the form , for and , , where is the th row of a known matrix , is the th row of a known matrix and is the

th observation of the response vector

for the

th unit or cluster. This kind of data arise from longitudinal studies and other cluster sampling designs (e.g., spatial cluster designs). Throughout the paper, the covariates

and are assumed to be given and measured without error. The vector of zeros and ones will be denoted by, respectively, and , the identity matrix by , and the matrix of zeros by . Finally, the Kronecker product and the direct sum will be denoted by and , respectively.

4.2 The model

We define the following th additive quantile regression model


for , where is a -specific, centered, twice-differentiable, smooth function of the th component of . The vector collects cluster-specific random effects associated with and its distribution is assumed to depend on a -specific parameter (further details are provided in the next section).

Without loss of generality, let the components of be ordered in such a way that the first terms of the summation in (1) are nonlinear functions and the remaining are linear. To model nonlinear functions, we consider a spline model of the type

(e.g., cubic or B-Spline), where the ’s and ’s, , denote, respectively, the basis functions and the corresponding coefficients, and

depends on the degrees of freedom or the number of knots. Note that the coefficients are

-specific. The quantile function in (1) is then approximated by


Let be the vector of values taken by the th spline evaluated at , be the vector of spline coefficients for the th covariate, and . Further, define as the matrix with rows , , and let . With a slight abuse of notation, we write (2) for the th cluster in matrix form as


where is the matrix with rows , , and . We call (3) an additive linear quantile mixed model, or AQMM for short.

The additive model introduced above opens up the question on how to control the trade-off between bias and efficiency, and, thus, the degree of smoothness of the estimate. At this juncture of our paper, we take a detour to briefly introduce the well-known link existing between penalized splines and mixed-effect models (see, e.g., Ruppert, Wand and Carroll, 2003, for an excellent review of this topic). Consider the following regression spline model

for some design matrix and basis matrix , where is a vector of observations and is a vector of independent and identically distributed (IID) normal errors of dimension . Penalized estimation of can be carried out by minimizing the objective function

where is the smoothing parameter and . The above objective function, rescaled by , is equivalent to the best linear unbiased prediction criterion of the linear mixed-effects model with , , and

. Since the variance

is estimated from the data, it follows that the degree of smoothing is automatically chosen by the estimation algorithm.

Automatic smoothing selection does not necessarily lead to optimal smoothing (Ruppert, Wand and Carroll, 2003). However, one of the advantages of working with random spline coefficients when modelling cluster data is that can be subsumed in the random part of the model containing the cluster-specific effects. Choice of the ‘prior’ distribution for effectively corresponds to choosing the form of the penalty. One approach is to use the same metric for the penalty term as that for the fidelity term. The

-penalty, which is linked to the double exponential distribution

(Geraci and Bottai, 2007, 2014), is sometimes used in quantile regression models due to its computational convenience (Koenker, Ng and Portnoy, 1994; Koenker and Mizera, 2004). The resulting smoothed curves are piecewise linear and are most useful in the presence of breakpoints, sharp bends, and spikes (Koenker and Mizera, 2004).

In contrast, the -penalty represents a more suitable choice for modelling smooth changes as in, for example, variations of energy expenditure over time. This is, for example, the approach considered by Cox and Jones in the discussion of Cole (1988) who suggested the spline smoothing quantile regression model

where is the quantile regression check function (Koenker and Bassett, 1978) and denotes the indicator function. As compared to other roughness functionals, this kind of penalty yields a more visually appealing form of smoothness.

A natural link between -penalized splines and random effects is provided by the normal distribution. Hence, in our random-effects specification of (3), we assume that the vectors and follow zero-centered multivariate Gaussians with variance-covariance matrices and , respectively. Further, we assume that the ’s are independent for different (but may have a general covariance structure) and are independent from . Our objective function is then given by


with the convention that for a vector . Note that the ’s determine the amount of smoothing for the nonparametric terms.

4.3 Inference

The minimization of (4) is equivalent to fitting a linear quantile mixed model (LQMM) (Geraci and Bottai, 2007, 2014) where the asymmetric Laplace density

with location and scale , is employed as quasi-likelihood for the fidelity term.

Define and . Let denote the parameter of interest, where is an unrestricted -dimensional vector, , of non-redundant parameters in (e.g., see Pinheiro and Bates, 1996) and . Our goal is to maximize the marginal log-likelihood


where and are the scaled variance-covariance matrices of the random effects. Note that this is a three-level hierarchical model, with the innermost grouping factor represented by the clusters and the outermost factor represented by one single-level group (i.e., the entire sample). Despite the three levels, we define as the predictions at level since the smooth terms originally ‘belong’ to the fixed design matrix. Similarly, we define as the predictions at level (i.e., at the cluster level).

Estimation proceeds using a double approximation:

  1. the loss function

    is first smoothed at the kink ;

  2. the integral in (4.3) is then solved using a Laplacian approximation for the (smoothed) loss function.

As proposed by Geraci (2017a), we consider the following smooth approximation (Madsen and Nielsen, 1993; Chen, 2007):


where and is a scalar “tuning” parameter. A similar approximation is given by Muggeo, Sciandra and Augugliaro (2012) who claimed that their method provides a better approximation than Chen’s (2007) algorithm. However, no analytical evidence was provided in their paper to support such a claim. This point might offer scope for additional investigation but, here, it represents a secondary issue and will not be discussed any further.

We then replace the function in (4.3) with to obtain a smoothed likelihood and apply a second-order Taylor expansion (Pinheiro and Chao, 2006) to the resulting exponent. After some algebra, we obtain the following Laplacian approximation

where is the scaled variance-covariance matrix of , and and are the terms of order, respectively, 0 and 2 of the above-mentioned Taylor expansion around the mode (see Appendix A for more details).

When using the asymmetric Laplace as pseudo-likelihood, inference should be confined to point estimation (see for example Yang, Wang and He, 2016)

. Standard errors of non-random parameters estimates can be calculated using block bootstrap

(Efron and Tibshirani, 1998)

, although this increases the computational cost. Bootstrap confidence intervals have been shown to have good coverage in LQMMs

(Geraci and Bottai, 2014). Given the relatively large size of the MCS dataset, for the analysis in Section 6 we implemented an adaptation of the method by Kleiner et al. (2014). The general idea is to perform a bootstrap on several subsets of the original data and then summarize measures of uncertainty from all subsets. This strategy, called ‘bag of little bootstraps’ (BLB), greatly reduces the computing cost when the sample size is large (see Kleiner et al., 2012, 2014, for more details). The original method was developed for IID observations. Since we are dealing with clusters, we adapted the BLB approach as follows:

  1. sample without replacement subsets of size from the pool of clusters (random partition);

  2. for each of the subset, repeatedly ( times) take a bootstrap sample of size and fit an AQMM for each replicate;

  3. for each of the subsets, calculate the bootstrap variance;

  4. as final estimate of the standard error, take the square root of the average of the variances in step 3.

As explained by the authors, the advantage of the BLB approach as compared to traditional bootstrap lies in the smaller size of the subsets. Although the nominal bootstrap sample size is , there are at most unique clusters in each subset. To obtain a bootstrap replicate, we only need a sample from a multinomial distribution with

trials and uniform probability over

possible events. Estimation proceeds with a weighted likelihood, where the cluster-specific weights are given by the multinomial counts.

4.4 Implementation

The methods described in this section were implemented as an add-on to the R package lqmm (Geraci, 2014). The add-on is currently available from the author’s website ( and will appear in a future release of the main package. The core function made use of routines available from the mgcv (Wood, 2006a) and nlme (Pinheiro et al., 2017) packages using syntax and options (e.g., selection of spline models) that are familiar to the users of these packages.

5 Simulation study

We ran a simulation study to assess the proposed methods. In our analysis, we considered the two most relevant alternatives for additive regression modelling. The first is based on additive mixed-effects regression (AMM), which is available in the R package mgcv. This package makes use of nlme’s fitting routines. The other is represented by Fenske et al.’s (2013) additive fixed-effects quantile regression (AFEQR) for longitudinal data, which is available in the mboost package (Hofner et al., 2014). Since the former approach aims at modelling the conditional expectation of the outcome under the assumption of normal errors, AQMM should have an advantage over AMM when the true errors are non-normal and the location-shift hypothesis of the normal model is violated. On the other hand, AFEQR is directly comparable to AQMM since they both aim at the conditional quantiles of the outcome with no assumption about the error distribution. However, as noted in Section 3.1, there are two basic differences between these two quantile regression approaches since in AQMM: (i) the cluster-specific effects are assumed to be random as opposed to fixed, thus a covariance structure between effects can be introduced; and (ii) the level of smoothing of the nonparametric terms is automatically estimated from the data (as reciprocal of the variance components) as opposed to prior specification. These are not necessarily advantages (or disadvantages) but they do represent aspects to consider when choosing a strategy for modelling and estimation.

The data were generated according to the following model


where , , , , , , , and

In one scenario, we set (homoscedastic), while in a separate scenario we set

(heteroscedastic). Within these two scenarios, the error was generated according to either a standard normal, a Student’s

-distribution with 3 degrees of freedom, or a -distribution with 3 degrees of freedom. Thus, in total there were different models. For each model, a balanced dataset was generated according to 6 sample size combinations of and , yielding simulation cases. Each case was replicated times.

For each replication, we fitted the AQMM defined in (2) for using a cubic spline for the nonlinear terms associated with and . The model also included a random intercept and a random slope for with a symmetric positive-definite covariance matrix. We followed the estimation algorithm described in Appendix A. We used a Nelder-Mead algorithm to maximize the approximated log-likelihood and a tolerance of for the relative change of the log-likelihood as the stopping criterion. The modal random effects (A.7) were estimated using a Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm with gradient calculated as in (A.5). Since different strategies can be used to determine the starting values, we considered a naïve and a model-based approach. In the former case, we used the least squares (LS) estimate for , the identity matrix for , the mean of the absolute LS residuals for

, and half the standard deviation of

for . The random effects were all set equal to 0. In the latter case, we used parameter and random effects estimates from an AMM. All the results reported in this paper refer to the latter approach since it gave a superior performance.

The AFEQR models were fitted following Fenske et al.’s (2013) recommendations for the settings of the boosting algorithm, namely the maximum number of boosting iterations, the step length parameter , and the degrees of freedom of the base-learners. The first two parameters were set to 5000 and , respectively, as determined by cross-validation (separately for the homoscedastic and the heteroscedastic scenarios). As explained by the authors, these two ‘hyper-parameters’ control the shrinkage of the estimates. The larger the step length (or the smaller the number of iterations), the more biased and shrunken to zero the estimates will be. The number of degrees of freedom was set to 3 for each term of the boosting algorithm. Since the degree of smoothing is controlled by the number of boosting iterations, the final degree of smoothing at the end of the algorithm can still reach a higher order than that imposed by the initial degrees of freedom (Fenske et al., 2013). We used the mboost package (v. 2.8-1) in the R environment for statistical computing and graphics (R Core Team, 2014) (v. 3.4.2) on a desktop computer with a 3.60GHz quad core i7-4790 processor and 32 gigabytes of RAM.

As a measure of performance, we calculated the bias and the root mean squared error (RMSE) of the level-1 predicted quantile functions. The RMSE of the predictions is given by

where denotes the true th conditional quantile function based on model (5). Analogously, we calculated the relative bias and RMSE of the coefficients for the linear terms, namely and . Note that these coefficients do not vary across quantiles in the homoscedastic scenario. In contrast, the value of the coefficient for depends on in the heteroscedastic scenario. Its ‘true’ value was determined empirically by fitting a linear quantile regression model (Koenker and Bassett, 1978) with 4th degree polynomials on the nonlinear terms for samples of size . Finally, we determined the proportion of negative residuals (PNR)

which is expected to be approximately equal to . All summary measures were averaged over the replications.

Given the large number of results of the simulation study, all the tables are reported in Appendix B and the results are summarized here as follows.

  1. Prediction of conditional quantiles:


    Homoscedastic scenario

    AQMM showed bias and RMSE lower than those of AFEQR consistently across all sample sizes for . At the median, AQMM and AFEQR gave similar results, both showing low bias and RMSE values. The RMSE of AMM was particularly large in the case with asymmetric errors, as one would expect. However, in the case with normal errors, it was comparable to that of AQMM and surprisingly larger than that of AFEQR. Across quantiles, PNR rates for AQMM (and AFEQR) were equal to the expected nominal or differed at most by one hundredth.

    Heteroscedastic scenario

    AQMM still had a clear advantage over AFEQR except at the median, where again the two performed similarly, and at under skewed errors. The performance of AMM relative to AQMM or AFEQR was comparable to that seen in the homoscedastic scenario.

  2. Estimation of :


    Homoscedastic scenario

    AQMM performed well as compared to AFEQR in terms of both bias and RMSE consistently across all quantiles and sample sizes. In particular, the bias of AQMM was much below for the most part. In contrast, AFEQR’s bias ranged from to and was more severe on the tails. At the median, AMM’s performance was comparable to that of AQMM in terms of bias, but it was otherwise inferior in terms of RMSE.

    Heteroscedastic scenario

    As compared to the homoscedastic scenario, AQMM and AFEQR showed larger biases on the tails, with percentages ranging from below to and from to , respectively. AQMM had, as before, consistently lower bias than AFEQR as well as lower RMSE, though the latter was occasionally higher for AQMM at lower sample sizes. At the median, the performance of AMM in terms of bias was rather poor under the scenario with skewed errors, but reasonable under the two scenarios with symmetric errors. In contrast, AMM was less efficient than AQMM in all three error scenarios at all sample sizes.

  3. Estimation of :


    Homoscedastic scenario

    AQMM outperformed AFEQR in terms of both bias and RMSE consistently across all sample sizes for . AFEQR’s bias was, again, particularly high and around at . At , the bias of AQMM was larger than that seen for and somewhat erratic in the case with asymmetric errors, presumably due to the fact that the random component of the model includes and thus the estimation of is affected by the Laplacian approximation. In contrast, AFEQR gave larger RMSE values than AQMM but, occasionally, lower biases. At the median, AMM performed well in terms of bias in all scenarios but was less efficient than AQMM.

    Heteroscedastic scenario

    The results were similar to those obtained in the heteroscedastic scenario.

Finally, we provide a brief report on the computational performance and the sensitivity of the results to different starting values. AQMM reached convergence in of the replications. The median number of iterations to reach convergence for one model was 19 (min 5, max 293), while the median of the smoothing parameter at the last iteration was approximately (min , max ).

When using the naïve approach to determine the starting values for AQMM, the estimation algorithm converged of the times, and took on average less iterations and a shorter time to converge. The median number of iterations to reach convergence for one model was 13 (min 3, max 20), while the median of the smoothing parameter at the last iteration was approximately (min , max ). However, the bias and RMSE were in general slightly higher than those reported in Supplementary Materials. Regardless, the conclusions reached about the performance of AQMM relative to AFEQR and AMM were the same as those discussed above.

6 Data analysis

The MCS accelerometer data were collected between May 2008 and August 2009 from participating children of the fourth sweep of the parent longitudinal survey, which provided information on several covariates, including socio-demographic and behavioural variables. A number of cleaning and processing procedures were applied to the raw accelerometer data (Geraci et al., 2012; Rich et al., 2014) using the R package pawacc (Geraci, 2017b). Out of children participating in the study, approximately provided reliable data, the latter defined as data from accelerometers that were deemed to have been worn for at least two days, at least 10 hours each day (Rich et al., 2013; Griffiths et al., 2013b). However, for the purpose of our analysis, we retained observations only for those children with reliable data between 7:00 and 20:00 of each day of the week.

Variable Levels Children () Measurements ()
Sex Male (reference) 614
Female 540
Ethnicity White (reference) 962
Other than white 192
Income quintile 1 117
2 170
3 220
4 318
5 (reference) 329
Reading for pleasure Often (reference) 998
Not often 156
Transportation to/from Active (reference) 604
school Passive 550
Number of cars or vans 0 65
owned 1 412
2 (reference) 620
3 or more 57
Day of the week Monday through Friday
Saturday or Sunday
Season Autumn
Summer (reference)
Variable Unit FNS
Time of the day min
BMI kg/m
Accelerometer counts
  •  Mixed ethnicity, Indian, Pakistani, Bangladeshi, and Black.

  •  Less than once or twice a week.

  •  Five-number-summary: minimum, three quartiles, and maximum.

Table 1: Categorical and continuous variables for English children of the Millennium Cohort Study. The dataset consists of accelerometer measurements, aggregated over 10-minute intervals, from a total of 1154 children. Note that the reference categories are the modal categories.

We considered several covariates. Linear terms pertaining to the socio-demographic domain were sex (binary, reference: male) and ethnic group (binary, reference: white) of the child, and OECD equivalized income quintiles (categorical, reference: fifth quintile). Linear terms pertaining to the behavioural domain were time spent reading for enjoyment (binary, reference: often), mode of transport to/from school (binary, reference: active), number of cars or vans owned (categorical, reference: two). Linear terms pertaining to the temporal domain were day of the week (binary, reference: weekday) and calendar season (categorical, reference: summer). Finally, we considered three nonparametric terms, one for time of the day on weekdays, one for time of the day on weekends, and one for body mass index (BMI). The outcome variable was accelerometer counts. The analysis was restricted to singletons born in England. This decision was motivated by the ethnic composition of the sample, consisting of almost all white children in Wales, Scotland and Northern Ireland. Since ethnicity is a strong predictor of physical activity (Griffiths et al., 2013a) and ethnicity is confounded with country, we removed children from Celtic countries. Further, we excluded 15 children with missing information on ethnicity and BMI. A summary of the dataset is given in Table 1. Our sample comprised 1,154 children for whom accelerometer counts were aggregated over 10-minute intervals between 7:00 and 20:00 (thus producing 79 time points), for seven days of the week. In total, this gave accelerometer measurements (that is, , ).

Using a similar notation as in (2), the th additive linear quantile regression model was specified as


for . For fitting purposes, the outcome was scaled by . The variables and , , denote the time of the day for, respectively, weekdays and weekend days. Time was expressed as minutes divided by (e.g., with 0.29 corresponding to 7:00 and 0.83 to 20:00) and then centered about its mid-value (0.56 corresponding to 13:30). Similarly, BMI was centered about its mode (15.5 kg/m). Given the large size of the dataset, smooth terms were modelled using low-rank thin plate splines (Wood, 2003), which have been shown to possess optimal properties both statistically and computationally. The random effects were assumed to follow a multivariate normal distribution with symmetric positive-definite variance-covariance matrix

The first two terms of (6) can be interpreted as the th time-specific quantile function of accelerometer counts on an summer weekday for a boy of white ethnicity with modal BMI living in a household in the highest income quintile and with two cars, who reads often (at least once or twice a week) and walks or bikes from/to school (as opposed to moving by car or bus), and whose temporal (linear) trajectory belongs to the zero (or modal) random-effect cluster.

We made an attempt to fit an analogous AMM to obtain starting values for AQMM. However, the function gamm failed due to insufficient memory. We also tried with a smaller subset of 200 children, but the gamm function failed with a convergence error. Given the satisfactory simulation results, we therefore used the naïve approach described in Section 5 to determine the starting values.

The plot in Figure 2 shows the estimated quantile function at level 0 for a child in the reference group. Diurnal patterns show markedly different shapes during the week. On weekdays, there are multiple peaks of activity in the morning and early afternoon, followed by a plateu of higher activity in the evening. On weekends, the trajectories look flatter and are characterized by two grand peaks around 11:00 and 17:00.

Figure 2: Accelerometer counts observed between 7:00 and 20:00 and aggregated over 10-minute intervals in 1154 English children of the UK Millennium Cohort Study, by days of the week (Monday through Friday, weekdays; Saturday and Sunday, weekend). Solid lines represent conditional quantile functions estimated for a child in the reference group for 5 quantile levels .

Estimates of the fixed effects and standard errors from AQMM are reported in Table 2. The latter were obtained using the BLB approach described in Section 4.3 with a fivefold partition () and bootstrap replications. Some of the findings are consistent with those from previous analyses (Griffiths et al., 2013a; Sera et al., 2017) that focused on the central part of the distribution, namely: girls and children of ethnicity other than white are less active than their peers; reading frequently during the week is negatively associated with activity; and higher activity levels characterize spring and summer, followed by autumn and winter.

However, the narrative emerging from Table 2 is more variegated than this. First of all, there is a gradient across quantiles of increasingly larger differences in activity levels for girls and children of ethnicity other than white. Secondly, activity is lower in children from less affluent households at the most extreme quantile. In particular, activity is lower in those from economically disadvantaged (first quintile) across all quantiles. However, the estimates of the coefficients for income have large standard errors, resulting in statistical non-significance at the level. The effects associated with reading and mode of transportation does not seem to be important, neither practically nor statistically. In contrast, there are marked differences between children living in households with two vehicles (reference) and those with none, the latter being substantially more active. It also seems that at the quantile , there is a -shaped relationship between car/van ownership and activity counts.

While main effects of weekend on activity levels are approximately the same as those during the rest of the week across several quantiles, there is a rather strong positive weekend effect at the more extreme quantile. The results reported by Sera et al. (2017) showed no weekend effect, which is likely the consequence of averaging out stronger and weaker effects. Finally, it interesting to note that the magnitude of the seasonal effects too increases with increasing quantiles. This is consistent with another quantile regression analysis of the MCS accelerometer data (Geraci and Farcomenti, 2016).

The estimated effect of BMI on activity counts for a child in the reference group is depicted in Figure 3. While the relationship is roughly constant up to the quantile , it is nonlinear at , with an overall negative gradient. The variance of the corresponding smooth term (Table 2) indicates a stronger penalty on the spline coefficients at the most extreme quantile.

Fixed effects
Intercept 992 (101) 4408 (183) 13704 (305) 18473 (492) 31065 (1136)
Sex (female) 24 (64) 180 (95) 2049 (222) 2752 (328) 3113 (843)
Ethnicity (not white) 101 (104) 82 (124) 1126 (285) 1696 (388) 3964 (948)
Income quintile (1) 43 (146) 39 (216) 483 (422) 784 (567) 2747 (1525)
Income quintile (2) 70 (124) 99 (148) 35 (337) 2 (519) 369 (1679)
Income quintile (3) 53 (95) 13 (129) 237 (303) 512 (448) 1196 (1009)
Income quintile (4) 44 (83) 35 (129) 135 (273) 56 (412) 776 (1089)
Reading for pleasure (not often) 92 (114) 122 (141) 8 (367) 69 (502) 276 (1179)
Transportation (passive) 62 (72) 86 (87) 274 (226) 409 (363) 209 (750)
Number of cars or vans (0) 25 (169) 77 (224) 1121 (549) 1279 (581) 3315 (739)
Number of cars or vans (1) 53 (83) 75 (100) 564 (231) 682 (368) 2083 (850)
Number of cars or vans (3+) 4 (164) 53 (224) 518 (496) 655 (809) 2586 (1072)
Day of the week (weekend) 148 (103) 131 (106) 168 (223) 45 (364) 3023 (1066)
Season (autumn) 12 (76) 164 (91) 958 (209) 1067 (313) 3155 (774)
Season (winter) 3 (199) 204 (326) 1377 (561) 1675 (705) 2999 (1124)
Season (spring) 82 (117) 244 (156) 1242 (333) 2197 (528) 6027 (2224)
Linear basis term 410 (50) 782 (33) 2050 (54) 2639 (107) 5374 (448)
for time of the day (weekdays)
Linear basis term 635 (56) 980 (46) 2620 (83) 3434 (158) 7659 (965)
for time of the day (weekend)
Linear basis term for BMI 40 (125) 11 (51) 16 (113) 64 (166) 95 (387)
Standard deviations (random effects)
(intercept weekdays) 2969 3923 2882 2769 4897
(intercept weekend) 3015 3526 2842 2809 5054
(time of the day weekdays) 2868 3575 2817 2800 5069
(time of the day weekend) 2867 3376 2940 2858 5017
Correlations (random effects)
0.93 0.97 0.73 0.36 0.52
0.99 0.99 0.94 0.89 0.93
0.93 0.97 0.71 0.35 0.51
0.93 0.97 0.73 0.37 0.52
0.99 0.99 0.93 0.88 0.93
0.93 0.97 0.72 0.36 0.51
Standard deviations (smooth terms)
4136 15215 4114 4343 1722
8385 14541 6699 2402 515
2905 4945 7094 2777 181
PNR 0.11 0.50 0.90 0.95 0.99
Table 2: Estimated fixed effects (counts per 10 minutes) and, in brackets, their standard errors, followed by estimated standard deviations and correlations of the random effects, standard deviations of the random spline coefficients, and proportion of negative residuals (PNR) from the additive quantile mixed model for the Millennium Cohort Study physical activity data. The reference categories are given in Table 1.

The estimated standard deviations of the random effects show larger variability of individual linear trends (intercepts and temporal slopes) at the median and at (Table 2). The correlation between random effects within weekdays or within weekends is strong, but the cross-correlation between weekdays and weekends terms is substantially weaker in the second half of the conditional distribution. This means that children tend to have trends of higher-intensity activity that are less similar between weekdays and weekends.

Individual trajectories of accelerometer counts for two children of the MCS are plotted in Figure 4. Despite both being white females with similar BMI (), living in a household with income in the lowest quintile and one car, having similar behaviors in terms of reading (often) and transportation (passive), they showed somewhat different daily patterns during summer weekend days. In particular, the conditional distribution was markedly skewed for the girl with identifier M16179P.

Figure 3: Smooth functions of body mass index (BMI) estimated for a child in the reference group of the UK Millennium Cohort Study physical activity dataset for 5 quantile levels .
Figure 4: Cluster-specific conditional quantile functions for two children (labelled M16179P and M19773P) of the UK Millennium Cohort Study physical activity dataset for 5 quantile levels , by days of the week (Monday through Friday, weekdays; Saturday and Sunday, weekend).

7 Conclusion

We have developed a novel additive model for quantile regression when data are clustered. As compared to alternative approaches, ours has unique features, namely the mixed-effects representation of smoothing splines, which in turn leads to automatic smoothing selection, and the ability to model the variance-covariance matrix of the random effects.

As shown in a simulation study, the performance of AQMM was satisfactory despite the minimal tuning of the estimation algorithm. This takes a little burden away from the user who may instead focus their attention on other aspects of the analysis. This can be an asset if the data presents complexities as those illustrated in the MCS accelerometer analysis. In particular, the presence of a large number of regression coefficients and multiple smooth terms hinders the application of computationally intensive smoothing selection (e.g., cross-validation) to large datasets.

Standard error calculation in AQMM is facilitated by bootstrap. We were able to overcome the relatively large size of the MCS dataset by using an adaptation of the BLB approach (Kleiner et al., 2014). However, the versatility of bootstrap comes at a (computational) price and its application is limited to more central quantiles unless the sample size (i.e., number of clusters) is adequate. Further research is needed to develop accurate ‘sampling-free’ approximations of standard errors in AQMM as well as in LQMM.

Finally, in contrast to estimation based on numerical quadrature (Geraci and Bottai, 2014), random effects estimates in AQMM are a by-product of the optimization algorithm rather than being calculated post hoc. However, the proposed algorithm can be more demanding in terms of computing time as compared to, say, numerical quadrature or boosting, with the computational bottleneck indeed represented by the estimation of the random effects. For example, it took about two hours to fit a single AQMM using the MCS dataset. While, on the one hand, the large size of this dataset impaired even one of the most refined software for linear mixed-effects models, on the other hand a possible improvement in computing speed of the proposed algorithm is conceivable and is part of future research.


This research has been supported by the National Institutes of Health – National Institute of Child Health and Human Development (Grant Number: 1R03HD084807-01A1). The co-operation of the participating Millennium Cohort families is gratefully acknowledged as is the contribution of the management team at the Centre for Longitudinal Studies, UCL Institute of Education to collecting and, together with the UK Data Archive and Economic and Social Data Service, making the data available.

Appendix A - Inference

The goal is to maximize the log-likelihood (4.3) with respect to the parameter vector .

Let be the vector of residuals for the th cluster with generic element , and define the corresponding sign vector with


(The notation above has been simplified since the ’s as well as the ’s should be written as functions of .) We apply the smooth approximation (Madsen and Nielsen, 1993; Chen, 2007) given in equation (6) to the elements of and write


where is an diagonal matrix with diagonal elements , and are two vectors with elements



For a more compact notation, let , , , , , , , , and . We now define the function


The smoothed version of the log-likelihood (4.3) is then given by


For , we have that .

Since is differentiable with respect to , we can derive the following quantities

The above derivatives can be written more compactly for all clusters as


Moreover, let


be the conditional mode of . A second-order approximation of around is given by

where , , and