1 Introduction
The goal of regression analysis is to model the distribution of an outcome as a function of one or more covariates. Mean regression is used to assess how the outcome changes on average when the covariates change, and often implies that the direction and strength of the statistical associations are the same for all individuals in a population. However, conditionally on their observed characteristics, subjects who rank below or above the average of the outcome distribution may respond differently to the same treatment or exposure. Evidence of heterogeneous effects across the outcome distribution have been found in countless research including the effect of smoking on weight in lighter or heavier infants
(Abrevaya, 2001; Koenker and Hallock, 2001; Geraci, 2016a); the effect of succimer chelation on different levels of cadmium concentrations in children’s blood (Cao et al., 2013); or the effect of sedentary behavior and food prices on different centiles of children’s anthropometric variables (España Romero et al., 2013; Sturm and Datar, 2005). These children may be at higher risks of morbidity and mortality than those who are at the center of the distribution.By definition, mean effects average out stronger and weaker effects. The averaging may even cancel out symmetric effects of same magnitudes but opposite signs on the tails of the distribution. Quantile regression (QR) (Koenker and Bassett, 1978) is a flexible statistical tool with a vast number of applications that complements mean regression. QR has become a successful analytic method in many fields of science because of its ability to draw inferences about individuals that rank below or above the population conditional mean. The ranking within the conditional distribution of the outcome can be considered as a natural index of individual latent characteristics which cause heterogeneity at the population level (Koenker and Geling, 2001). There is an increasingly wider acknowledgement of the importance of investigating sources of heterogeneity to quantify more accurately costs, benefits, and effectiveness of interventions or medical treatments, whether it be an afterschool physical activity program, a health care reform, or a thrombolytic therapy (see, for example, Austin et al., 2005; Beets et al., 2016; Beyerlein, 2014; Ding et al., 2010; Rehkopf, 2012; Wei and Terry, 2015; Winkelmann, 2006). QR is particularly suitable for this purpose as it yields inferences that are valid regardless of the true underlying distribution. Also, quantiles enjoy a number of properties (Gilchrist, 2000)
, including equivariance to monotone transformations and robustness to outliers.
In this paper we are concerned specifically with nonparametric quantile regression functions of continuous response variables when data arise from cluster designs. Our research is motivated by a study on daily and weekly physical activity patterns in schoolaged children using highfrequency accelerometer data. In general, temporal (diurnal) trajectories of physical activity are characterized by strongly nonlinear patterns that require some degree of smoothing
(Butte, Ekelund and Westerterp, 2012; Fan et al., 2015; Morris et al., 2006; Sera et al., 2011, 2017). On the other hand, some predictors of interest may simply have linear effects. If, in addition, data are collected longitudinally to examine weekly patterns, then the correlation at the individual level must be taken into account.We propose novel additive quantile models that include linear terms, nonlinear terms, as well as randomeffects terms which account for the clustering. Further, nonlinear terms are modelled nonparametrically using penalized splines and fitted via automatic scatterplot smoothing within a mixed model framework (Ruppert, Wand and Carroll, 2003). In the next section, we briefly describe the data. In Section 3, we describe the state of the art in nonlinear quantile regression and highlight the differences between our proposal and existing, potentially competing approaches. In Section 4, we describe the methods and, briefly, their implementation in the R language (R Core Team, 2014), with further technical details provided in Appendix A. In Section 5, we carry out a simulation study to assess the performance of the proposed methods (with tables reported in Appendix B). The real data analysis is presented in Section 6 while concluding remarks are given in Section 7.
2 Sedentariness and physical activity in UK children
The benefits of regular physical activity on wellbeing and life expectancy as well as the detrimental health effects of sedentary behavior have been amply documented (e.g., see Ekelund et al., 2016; Katzmarzyk and Pate, 2017; Warburton, Nicol and Bredin, 2006)
. Physical inactivity in England is estimated to cost more than eight billion pounds a year. This includes both the direct costs of treating major, lifestylerelated diseases and the indirect costs of sickness absence
(National Institute for Health and Clinical Excellence, 2008). It is also estimated that 54,000 premature deaths a year are linked to a sedentary lifestyle (Department for Culture, Media and Sport, 2002).Since establishing an active lifestyle at an early age is an important form of prevention against morbidity and premature mortality, promotion of physical activity in children and young people has gained a strategically central place in the public health agenda (National Institute for Health and Clinical Excellence, 2009). A nationally representative study (Griffiths et al., 2013a) showed that only half of UK sevenyearolds achieve recommended levels of physical activity, with girls far less active than boys. It is therefore important to identify the predictors of physical activity, not only at the average intensity of activity but also (and perhaps especially) at lower and upper intensities.
Accelerometer data collected in sevenyearold children of the Millennium Cohort Study (MCS), a UKwide longitudinal multipurpose survey, represent a major, largescale epidemiological resource to study physical activity determinants (Griffiths et al., 2013b). Accelerometers are devices capable of providing an objective measure of the intensity and duration of movement. They produce an output known as ‘acceleration counts’ which is dimensionless and thus requires calibration in order to be converted into physiologically more relevant units, usually based on energy expended per unit of time (e.g., metabolic equivalent of task). The plot in Figure 1
shows accelerometer counts by time of the day for a subset of MCS children who provided reliable data for 7 days of the week. Since temporal trajectories of activity were similar between Monday and Friday, and during Saturday and Sunday, data in the plot are shown for weekdays (or workdays) and weekend days, respectively. Each dot represents the accelerometer counts measured on any of the weekdays or weekend days for a child at a given time of the day, while the solid lines are piecewise linear curves connecting sample quantiles estimated crosssectionally (conditional on time). During weekdays there are periods of higher activity levels that mirror traveling times to and from school, and lunch and break times
(Sera et al., 2017). However, temporal trajectories at different quantile levels of the conditional distribution are not simply vertical shifts of one another. This suggests that the scale and possibly the shape of the counts distribution change with time of the day. For example, the skewness of the distribution in the weekend is small early in the morning, and steadily increases during the day.
3 Approaches to nonlinear quantile regression
Nonlinear associations occur in many research studies, including bioassays and pharmacokinetic experiments (Lindsey, 2001), as well studies related to growth processes in biology and agriculture (Davidian and Giltinan, 2003). In the presence of nonlinearity, there are different modelling strategies one may consider. Nonparametric models, smoothing splines (including polynomial models), and transformation models are the most commonly adopted. In addition to the specific strategy, one must also take into account the particular analytical framework: for example, is the sampling design crosssectional or longitudinal? Is the outcome discrete or continuous? Should distributional assumptions be parametric or nonparametric? And so on. The effort needed for the analysis may have, at times, considerable weight on the final decision regarding which approach to follow. Lack of theory or even computer software can move the needle towards one choice over another.
The literature on parametric and nonparametric nonlinear regression models is vast. However, most of the work has been done in relation to nonlinear mean regression. In comparison, substantially less methods have been developed for nonlinear quantile regression and even less for nonlinear quantile regression with clustered data. A brief account of strategies to nonlinear quantile regression modelling is given by Geraci (2016b)
and these consist in: (i) smoothing (nonparametric regression); (ii) nonlinear parametric models; and (iii) transformation models. In the next sections, we discuss methods that fall in the first group and that are of direct relevance to the present study, and briefly survey those in the other two groups for completeness.
3.1 Nonparametric models
Nonparametric regression and, more in general, additive models occupy an important place in statistical modelling. Popular approaches include locally weighted scatterplot smoothing (Cleveland, 1979; Cleveland and Devlin, 1988) and smoothing splines (Wahba, 1990; de Boor, 2001; Hastie and Tibshirani, 1990; Ruppert, Wand and Carroll, 2003). If we consider the literature on nonparametric quantile functions for data with no clustering, there is a considerable variety of proposals. For example, Koenker, Ng and Portnoy (1994), He, Ng and Portnoy (1998), Koenker and Mizera (2004), and Koenker (2011) considered total variation roughness penalties for univariate, bivariate, and additive quantile smoothing splines. For smooth functions, the total variation penalty corresponds to the counterpart of the
smoothing spline penalty and leads to an elegant, computationally convenient optimization problem which can be solved via linear programming
(Koenker and Mizera, 2004). Yu and Jones (1998), Horowitz and Lee (2005), and Spokoiny, Wang and Härdle (2013) proposed estimation methods based on kernel weighted and polynomial local linear fitting. Other, similar local fitting methods are given in Wu, Yu and Yu (2010) and Cai and Xu (2008). In the Bayesian framework for independent data, Thompson et al. (2010) proposed an approach based on natural cubic splines (Green and Silverman, 1994) with a Gaussian smoothness prior.A few approaches have been proposed also for the estimation of nonparametric quantile functions with repeated measurements or when the data are subject to other forms of dependence. Wei et al. (2006) considered modelling and estimation of longitudinal growth curves. They discussed an additive quantile regression model decomposed into a nonparametric temporal trend, a firstorder autoregressive component, and a partially linear component to adjust for other covariates. Fenske et al. (2013) extended Fenske, Kneib and Hothorn’s (2011) method to additive quantile regression based on longitudinal data. In their model, fitted via boosting, they included penalized fixed clusterspecific intercepts and slopes (thus, no covariance structure) with prespecified shrinkage parameters, to account for individual deviations from a ‘population’ trend. Additive nonlinear effects were modelled via penalized splines, again, with prespecified amount of smoothing as controlled by the number of boosting iterations. An additive model is also considered by Yue and Rue (2011)
who proposed normally distributed random intercepts and nonlinear terms with Bayesian Psplines and Gaussian Markov random fields as smoothness priors. Finally, it is worth mentioning that a related, although different, approach involves modelling the quantile regression coefficients in varying coefficient models, such as those by
Andriyana, Gijbels and Verhasselt (2014) and Reich, Fuentes and Dunson for, respectively, longitudinally and spatially correlated data.The modelling approach we develop in this study differs on several accounts from existing proposals. First of all, we model the intracluster correlation by means of random effects instead of autoregressive errors (Wei et al., 2006). Secondarily, in contrast to Yue and Rue (2011), we include clusterspecific random slopes in addition to random intercepts, and, in contrast to Fenske et al. (2013), we allow clusterspecific effects to have a general covariance matrix. In addition, our estimation approach radically differs from that of Fenske et al. (2013) since, as described in Section 4, the optimal degree of shrinkage of the clusterspecific effects and the optimal level of smoothing for the nonlinear terms are automatically estimated from the data.
3.2 Nonlinear parametric models
Contributions to statistical methods for nonlinear mean regression when data are clustered can be found in the literature of mixedeffects modelling (Lindstrom and Bates, 1990; Pinheiro and Bates, 1995, 2000) as well as generalized estimating equations (Davidian and Giltinan, 1995, 2003; Contreras and Ryan, 2000; Vonesh et al., 2002). In contrast, there seem to be only a handful of published articles in the statistical literature on parametric nonlinear quantile regression functions with clustered data. Karlsson (2008) considered nonlinear longitudinal data and proposed weighting the standard quantile regression estimator (Koenker and Bassett, 1978) with prespecified weights. Wang (2012), taking her cue from Geraci and Bottai (2007), used the AL distribution to define the likelihood of a Bayesian nonlinear quantile regression model. Huang and Chen (2016) proposed a Bayesian joint model for timetoevent and longitudinal data. An approach based on copulas is developed by Chen, Koenker and Xiao (2009). Oberhofer and Haupt (2016) established the consistency of the norm nonlinear quantile estimator under weak dependency. Finally, Geraci (2017a) extended Geraci and Bottai’s (2014) quantile mixed models to the nonlinear case.
3.3 Transformation models
Traditionally, transformations toward linearity have been developed for conditional mean function estimation (e.g. ArandaOrdaz, 1981; Box and Cox, 1964), with some proposals to deal with clustered data (see, for example, Gurka et al., 2006; Hutmacher et al., 2011; Maruo et al., 2017). This general approach has been taken by others in quantile function estimation when data are independent using the BoxCox transformation (Buchinsky, 1995; Chamberlain, 1994; Mu and He, 2007), the ArandaOrdaz transformation (Dehbi, CortinaBorja and Geraci, 2016), as well as other new flexible transformations (Geraci and Jones, 2015). In particular, Mu and Wei (2009) developed a BoxCox quantile regression model with varying coefficients for longitudinal data.
4 Methods
4.1 Notation
We consider data from twolevel nested designs in the form , for and , , where is the th row of a known matrix , is the th row of a known matrix and is the
th observation of the response vector
for theth unit or cluster. This kind of data arise from longitudinal studies and other cluster sampling designs (e.g., spatial cluster designs). Throughout the paper, the covariates
and are assumed to be given and measured without error. The vector of zeros and ones will be denoted by, respectively, and , the identity matrix by , and the matrix of zeros by . Finally, the Kronecker product and the direct sum will be denoted by and , respectively.4.2 The model
We define the following th additive quantile regression model
(1) 
for , where is a specific, centered, twicedifferentiable, smooth function of the th component of . The vector collects clusterspecific random effects associated with and its distribution is assumed to depend on a specific parameter (further details are provided in the next section).
Without loss of generality, let the components of be ordered in such a way that the first terms of the summation in (1) are nonlinear functions and the remaining are linear. To model nonlinear functions, we consider a spline model of the type
(e.g., cubic or BSpline), where the ’s and ’s, , denote, respectively, the basis functions and the corresponding coefficients, and
depends on the degrees of freedom or the number of knots. Note that the coefficients are
specific. The quantile function in (1) is then approximated by(2) 
Let be the vector of values taken by the th spline evaluated at , be the vector of spline coefficients for the th covariate, and . Further, define as the matrix with rows , , and let . With a slight abuse of notation, we write (2) for the th cluster in matrix form as
(3) 
where is the matrix with rows , , and . We call (3) an additive linear quantile mixed model, or AQMM for short.
The additive model introduced above opens up the question on how to control the tradeoff between bias and efficiency, and, thus, the degree of smoothness of the estimate. At this juncture of our paper, we take a detour to briefly introduce the wellknown link existing between penalized splines and mixedeffect models (see, e.g., Ruppert, Wand and Carroll, 2003, for an excellent review of this topic). Consider the following regression spline model
for some design matrix and basis matrix , where is a vector of observations and is a vector of independent and identically distributed (IID) normal errors of dimension . Penalized estimation of can be carried out by minimizing the objective function
where is the smoothing parameter and . The above objective function, rescaled by , is equivalent to the best linear unbiased prediction criterion of the linear mixedeffects model with , , and
. Since the variance
is estimated from the data, it follows that the degree of smoothing is automatically chosen by the estimation algorithm.Automatic smoothing selection does not necessarily lead to optimal smoothing (Ruppert, Wand and Carroll, 2003). However, one of the advantages of working with random spline coefficients when modelling cluster data is that can be subsumed in the random part of the model containing the clusterspecific effects. Choice of the ‘prior’ distribution for effectively corresponds to choosing the form of the penalty. One approach is to use the same metric for the penalty term as that for the fidelity term. The
penalty, which is linked to the double exponential distribution
(Geraci and Bottai, 2007, 2014), is sometimes used in quantile regression models due to its computational convenience (Koenker, Ng and Portnoy, 1994; Koenker and Mizera, 2004). The resulting smoothed curves are piecewise linear and are most useful in the presence of breakpoints, sharp bends, and spikes (Koenker and Mizera, 2004).In contrast, the penalty represents a more suitable choice for modelling smooth changes as in, for example, variations of energy expenditure over time. This is, for example, the approach considered by Cox and Jones in the discussion of Cole (1988) who suggested the spline smoothing quantile regression model
where is the quantile regression check function (Koenker and Bassett, 1978) and denotes the indicator function. As compared to other roughness functionals, this kind of penalty yields a more visually appealing form of smoothness.
A natural link between penalized splines and random effects is provided by the normal distribution. Hence, in our randomeffects specification of (3), we assume that the vectors and follow zerocentered multivariate Gaussians with variancecovariance matrices and , respectively. Further, we assume that the ’s are independent for different (but may have a general covariance structure) and are independent from . Our objective function is then given by
(4) 
with the convention that for a vector . Note that the ’s determine the amount of smoothing for the nonparametric terms.
4.3 Inference
The minimization of (4) is equivalent to fitting a linear quantile mixed model (LQMM) (Geraci and Bottai, 2007, 2014) where the asymmetric Laplace density
with location and scale , is employed as quasilikelihood for the fidelity term.
Define and . Let denote the parameter of interest, where is an unrestricted dimensional vector, , of nonredundant parameters in (e.g., see Pinheiro and Bates, 1996) and . Our goal is to maximize the marginal loglikelihood
(5) 
where and are the scaled variancecovariance matrices of the random effects. Note that this is a threelevel hierarchical model, with the innermost grouping factor represented by the clusters and the outermost factor represented by one singlelevel group (i.e., the entire sample). Despite the three levels, we define as the predictions at level since the smooth terms originally ‘belong’ to the fixed design matrix. Similarly, we define as the predictions at level (i.e., at the cluster level).
Estimation proceeds using a double approximation:

the integral in (4.3) is then solved using a Laplacian approximation for the (smoothed) loss function.
As proposed by Geraci (2017a), we consider the following smooth approximation (Madsen and Nielsen, 1993; Chen, 2007):
(6) 
where and is a scalar “tuning” parameter. A similar approximation is given by Muggeo, Sciandra and Augugliaro (2012) who claimed that their method provides a better approximation than Chen’s (2007) algorithm. However, no analytical evidence was provided in their paper to support such a claim. This point might offer scope for additional investigation but, here, it represents a secondary issue and will not be discussed any further.
We then replace the function in (4.3) with to obtain a smoothed likelihood and apply a secondorder Taylor expansion (Pinheiro and Chao, 2006) to the resulting exponent. After some algebra, we obtain the following Laplacian approximation
where is the scaled variancecovariance matrix of , and and are the terms of order, respectively, 0 and 2 of the abovementioned Taylor expansion around the mode (see Appendix A for more details).
When using the asymmetric Laplace as pseudolikelihood, inference should be confined to point estimation (see for example Yang, Wang and He, 2016)
. Standard errors of nonrandom parameters estimates can be calculated using block bootstrap
(Efron and Tibshirani, 1998), although this increases the computational cost. Bootstrap confidence intervals have been shown to have good coverage in LQMMs
(Geraci and Bottai, 2014). Given the relatively large size of the MCS dataset, for the analysis in Section 6 we implemented an adaptation of the method by Kleiner et al. (2014). The general idea is to perform a bootstrap on several subsets of the original data and then summarize measures of uncertainty from all subsets. This strategy, called ‘bag of little bootstraps’ (BLB), greatly reduces the computing cost when the sample size is large (see Kleiner et al., 2012, 2014, for more details). The original method was developed for IID observations. Since we are dealing with clusters, we adapted the BLB approach as follows:
sample without replacement subsets of size from the pool of clusters (random partition);

for each of the subset, repeatedly ( times) take a bootstrap sample of size and fit an AQMM for each replicate;

for each of the subsets, calculate the bootstrap variance;

as final estimate of the standard error, take the square root of the average of the variances in step 3.
As explained by the authors, the advantage of the BLB approach as compared to traditional bootstrap lies in the smaller size of the subsets. Although the nominal bootstrap sample size is , there are at most unique clusters in each subset. To obtain a bootstrap replicate, we only need a sample from a multinomial distribution with
trials and uniform probability over
possible events. Estimation proceeds with a weighted likelihood, where the clusterspecific weights are given by the multinomial counts.4.4 Implementation
The methods described in this section were implemented as an addon to the R package lqmm (Geraci, 2014). The addon is currently available from the author’s website (https://marcogeraci.wordpress.com) and will appear in a future release of the main package. The core function made use of routines available from the mgcv (Wood, 2006a) and nlme (Pinheiro et al., 2017) packages using syntax and options (e.g., selection of spline models) that are familiar to the users of these packages.
5 Simulation study
We ran a simulation study to assess the proposed methods. In our analysis, we considered the two most relevant alternatives for additive regression modelling. The first is based on additive mixedeffects regression (AMM), which is available in the R package mgcv. This package makes use of nlme’s fitting routines. The other is represented by Fenske et al.’s (2013) additive fixedeffects quantile regression (AFEQR) for longitudinal data, which is available in the mboost package (Hofner et al., 2014). Since the former approach aims at modelling the conditional expectation of the outcome under the assumption of normal errors, AQMM should have an advantage over AMM when the true errors are nonnormal and the locationshift hypothesis of the normal model is violated. On the other hand, AFEQR is directly comparable to AQMM since they both aim at the conditional quantiles of the outcome with no assumption about the error distribution. However, as noted in Section 3.1, there are two basic differences between these two quantile regression approaches since in AQMM: (i) the clusterspecific effects are assumed to be random as opposed to fixed, thus a covariance structure between effects can be introduced; and (ii) the level of smoothing of the nonparametric terms is automatically estimated from the data (as reciprocal of the variance components) as opposed to prior specification. These are not necessarily advantages (or disadvantages) but they do represent aspects to consider when choosing a strategy for modelling and estimation.
The data were generated according to the following model
(7) 
where , , , , , , , and
In one scenario, we set (homoscedastic), while in a separate scenario we set
(heteroscedastic). Within these two scenarios, the error was generated according to either a standard normal, a Student’s
distribution with 3 degrees of freedom, or a distribution with 3 degrees of freedom. Thus, in total there were different models. For each model, a balanced dataset was generated according to 6 sample size combinations of and , yielding simulation cases. Each case was replicated times.For each replication, we fitted the AQMM defined in (2) for using a cubic spline for the nonlinear terms associated with and . The model also included a random intercept and a random slope for with a symmetric positivedefinite covariance matrix. We followed the estimation algorithm described in Appendix A. We used a NelderMead algorithm to maximize the approximated loglikelihood and a tolerance of for the relative change of the loglikelihood as the stopping criterion. The modal random effects (A.7) were estimated using a BroydenFletcherGoldfarbShanno (BFGS) algorithm with gradient calculated as in (A.5). Since different strategies can be used to determine the starting values, we considered a naïve and a modelbased approach. In the former case, we used the least squares (LS) estimate for , the identity matrix for , the mean of the absolute LS residuals for
, and half the standard deviation of
for . The random effects were all set equal to 0. In the latter case, we used parameter and random effects estimates from an AMM. All the results reported in this paper refer to the latter approach since it gave a superior performance.The AFEQR models were fitted following Fenske et al.’s (2013) recommendations for the settings of the boosting algorithm, namely the maximum number of boosting iterations, the step length parameter , and the degrees of freedom of the baselearners. The first two parameters were set to 5000 and , respectively, as determined by crossvalidation (separately for the homoscedastic and the heteroscedastic scenarios). As explained by the authors, these two ‘hyperparameters’ control the shrinkage of the estimates. The larger the step length (or the smaller the number of iterations), the more biased and shrunken to zero the estimates will be. The number of degrees of freedom was set to 3 for each term of the boosting algorithm. Since the degree of smoothing is controlled by the number of boosting iterations, the final degree of smoothing at the end of the algorithm can still reach a higher order than that imposed by the initial degrees of freedom (Fenske et al., 2013). We used the mboost package (v. 2.81) in the R environment for statistical computing and graphics (R Core Team, 2014) (v. 3.4.2) on a desktop computer with a 3.60GHz quad core i74790 processor and 32 gigabytes of RAM.
As a measure of performance, we calculated the bias and the root mean squared error (RMSE) of the level1 predicted quantile functions. The RMSE of the predictions is given by
where denotes the true th conditional quantile function based on model (5). Analogously, we calculated the relative bias and RMSE of the coefficients for the linear terms, namely and . Note that these coefficients do not vary across quantiles in the homoscedastic scenario. In contrast, the value of the coefficient for depends on in the heteroscedastic scenario. Its ‘true’ value was determined empirically by fitting a linear quantile regression model (Koenker and Bassett, 1978) with 4th degree polynomials on the nonlinear terms for samples of size . Finally, we determined the proportion of negative residuals (PNR)
which is expected to be approximately equal to . All summary measures were averaged over the replications.
Given the large number of results of the simulation study, all the tables are reported in Appendix B and the results are summarized here as follows.

Prediction of conditional quantiles:

[font=]
 Homoscedastic scenario

AQMM showed bias and RMSE lower than those of AFEQR consistently across all sample sizes for . At the median, AQMM and AFEQR gave similar results, both showing low bias and RMSE values. The RMSE of AMM was particularly large in the case with asymmetric errors, as one would expect. However, in the case with normal errors, it was comparable to that of AQMM and surprisingly larger than that of AFEQR. Across quantiles, PNR rates for AQMM (and AFEQR) were equal to the expected nominal or differed at most by one hundredth.
 Heteroscedastic scenario

AQMM still had a clear advantage over AFEQR except at the median, where again the two performed similarly, and at under skewed errors. The performance of AMM relative to AQMM or AFEQR was comparable to that seen in the homoscedastic scenario.


Estimation of :

[font=]
 Homoscedastic scenario

AQMM performed well as compared to AFEQR in terms of both bias and RMSE consistently across all quantiles and sample sizes. In particular, the bias of AQMM was much below for the most part. In contrast, AFEQR’s bias ranged from to and was more severe on the tails. At the median, AMM’s performance was comparable to that of AQMM in terms of bias, but it was otherwise inferior in terms of RMSE.
 Heteroscedastic scenario

As compared to the homoscedastic scenario, AQMM and AFEQR showed larger biases on the tails, with percentages ranging from below to and from to , respectively. AQMM had, as before, consistently lower bias than AFEQR as well as lower RMSE, though the latter was occasionally higher for AQMM at lower sample sizes. At the median, the performance of AMM in terms of bias was rather poor under the scenario with skewed errors, but reasonable under the two scenarios with symmetric errors. In contrast, AMM was less efficient than AQMM in all three error scenarios at all sample sizes.


Estimation of :

[font=]
 Homoscedastic scenario

AQMM outperformed AFEQR in terms of both bias and RMSE consistently across all sample sizes for . AFEQR’s bias was, again, particularly high and around at . At , the bias of AQMM was larger than that seen for and somewhat erratic in the case with asymmetric errors, presumably due to the fact that the random component of the model includes and thus the estimation of is affected by the Laplacian approximation. In contrast, AFEQR gave larger RMSE values than AQMM but, occasionally, lower biases. At the median, AMM performed well in terms of bias in all scenarios but was less efficient than AQMM.
 Heteroscedastic scenario

The results were similar to those obtained in the heteroscedastic scenario.

Finally, we provide a brief report on the computational performance and the sensitivity of the results to different starting values. AQMM reached convergence in of the replications. The median number of iterations to reach convergence for one model was 19 (min 5, max 293), while the median of the smoothing parameter at the last iteration was approximately (min , max ).
When using the naïve approach to determine the starting values for AQMM, the estimation algorithm converged of the times, and took on average less iterations and a shorter time to converge. The median number of iterations to reach convergence for one model was 13 (min 3, max 20), while the median of the smoothing parameter at the last iteration was approximately (min , max ). However, the bias and RMSE were in general slightly higher than those reported in Supplementary Materials. Regardless, the conclusions reached about the performance of AQMM relative to AFEQR and AMM were the same as those discussed above.
6 Data analysis
The MCS accelerometer data were collected between May 2008 and August 2009 from participating children of the fourth sweep of the parent longitudinal survey, which provided information on several covariates, including sociodemographic and behavioural variables. A number of cleaning and processing procedures were applied to the raw accelerometer data (Geraci et al., 2012; Rich et al., 2014) using the R package pawacc (Geraci, 2017b). Out of children participating in the study, approximately provided reliable data, the latter defined as data from accelerometers that were deemed to have been worn for at least two days, at least 10 hours each day (Rich et al., 2013; Griffiths et al., 2013b). However, for the purpose of our analysis, we retained observations only for those children with reliable data between 7:00 and 20:00 of each day of the week.
Variable  Levels  Children ()  Measurements () 
Sex  Male (reference)  614  
Female  540  
Ethnicity  White (reference)  962  
Other than white  192  
Income quintile  1  117  
2  170  
3  220  
4  318  
5 (reference)  329  
Reading for pleasure  Often (reference)  998  
Not often  156  
Transportation to/from  Active (reference)  604  
school  Passive  550  
Number of cars or vans  0  65  
owned  1  412  
2 (reference)  620  
3 or more  57  
Day of the week  Monday through Friday  
(reference)  
Saturday or Sunday  
Season  Autumn  
Winter  
Spring  
Summer (reference)  
Variable  Unit  FNS  
Time of the day  min  —  
BMI  kg/m  
Accelerometer counts  —  
() 

Mixed ethnicity, Indian, Pakistani, Bangladeshi, and Black.

Less than once or twice a week.

Fivenumbersummary: minimum, three quartiles, and maximum.
We considered several covariates. Linear terms pertaining to the sociodemographic domain were sex (binary, reference: male) and ethnic group (binary, reference: white) of the child, and OECD equivalized income quintiles (categorical, reference: fifth quintile). Linear terms pertaining to the behavioural domain were time spent reading for enjoyment (binary, reference: often), mode of transport to/from school (binary, reference: active), number of cars or vans owned (categorical, reference: two). Linear terms pertaining to the temporal domain were day of the week (binary, reference: weekday) and calendar season (categorical, reference: summer). Finally, we considered three nonparametric terms, one for time of the day on weekdays, one for time of the day on weekends, and one for body mass index (BMI). The outcome variable was accelerometer counts. The analysis was restricted to singletons born in England. This decision was motivated by the ethnic composition of the sample, consisting of almost all white children in Wales, Scotland and Northern Ireland. Since ethnicity is a strong predictor of physical activity (Griffiths et al., 2013a) and ethnicity is confounded with country, we removed children from Celtic countries. Further, we excluded 15 children with missing information on ethnicity and BMI. A summary of the dataset is given in Table 1. Our sample comprised 1,154 children for whom accelerometer counts were aggregated over 10minute intervals between 7:00 and 20:00 (thus producing 79 time points), for seven days of the week. In total, this gave accelerometer measurements (that is, , ).
Using a similar notation as in (2), the th additive linear quantile regression model was specified as
(8) 
for . For fitting purposes, the outcome was scaled by . The variables and , , denote the time of the day for, respectively, weekdays and weekend days. Time was expressed as minutes divided by (e.g., with 0.29 corresponding to 7:00 and 0.83 to 20:00) and then centered about its midvalue (0.56 corresponding to 13:30). Similarly, BMI was centered about its mode (15.5 kg/m). Given the large size of the dataset, smooth terms were modelled using lowrank thin plate splines (Wood, 2003), which have been shown to possess optimal properties both statistically and computationally. The random effects were assumed to follow a multivariate normal distribution with symmetric positivedefinite variancecovariance matrix
The first two terms of (6) can be interpreted as the th timespecific quantile function of accelerometer counts on an summer weekday for a boy of white ethnicity with modal BMI living in a household in the highest income quintile and with two cars, who reads often (at least once or twice a week) and walks or bikes from/to school (as opposed to moving by car or bus), and whose temporal (linear) trajectory belongs to the zero (or modal) randomeffect cluster.
We made an attempt to fit an analogous AMM to obtain starting values for AQMM. However, the function gamm failed due to insufficient memory. We also tried with a smaller subset of 200 children, but the gamm function failed with a convergence error. Given the satisfactory simulation results, we therefore used the naïve approach described in Section 5 to determine the starting values.
The plot in Figure 2 shows the estimated quantile function at level 0 for a child in the reference group. Diurnal patterns show markedly different shapes during the week. On weekdays, there are multiple peaks of activity in the morning and early afternoon, followed by a plateu of higher activity in the evening. On weekends, the trajectories look flatter and are characterized by two grand peaks around 11:00 and 17:00.
Estimates of the fixed effects and standard errors from AQMM are reported in Table 2. The latter were obtained using the BLB approach described in Section 4.3 with a fivefold partition () and bootstrap replications. Some of the findings are consistent with those from previous analyses (Griffiths et al., 2013a; Sera et al., 2017) that focused on the central part of the distribution, namely: girls and children of ethnicity other than white are less active than their peers; reading frequently during the week is negatively associated with activity; and higher activity levels characterize spring and summer, followed by autumn and winter.
However, the narrative emerging from Table 2 is more variegated than this. First of all, there is a gradient across quantiles of increasingly larger differences in activity levels for girls and children of ethnicity other than white. Secondly, activity is lower in children from less affluent households at the most extreme quantile. In particular, activity is lower in those from economically disadvantaged (first quintile) across all quantiles. However, the estimates of the coefficients for income have large standard errors, resulting in statistical nonsignificance at the level. The effects associated with reading and mode of transportation does not seem to be important, neither practically nor statistically. In contrast, there are marked differences between children living in households with two vehicles (reference) and those with none, the latter being substantially more active. It also seems that at the quantile , there is a shaped relationship between car/van ownership and activity counts.
While main effects of weekend on activity levels are approximately the same as those during the rest of the week across several quantiles, there is a rather strong positive weekend effect at the more extreme quantile. The results reported by Sera et al. (2017) showed no weekend effect, which is likely the consequence of averaging out stronger and weaker effects. Finally, it interesting to note that the magnitude of the seasonal effects too increases with increasing quantiles. This is consistent with another quantile regression analysis of the MCS accelerometer data (Geraci and Farcomenti, 2016).
The estimated effect of BMI on activity counts for a child in the reference group is depicted in Figure 3. While the relationship is roughly constant up to the quantile , it is nonlinear at , with an overall negative gradient. The variance of the corresponding smooth term (Table 2) indicates a stronger penalty on the spline coefficients at the most extreme quantile.
Fixed effects  

Intercept  992 (101)  4408 (183)  13704 (305)  18473 (492)  31065 (1136) 
Sex (female)  24 (64)  180 (95)  2049 (222)  2752 (328)  3113 (843) 
Ethnicity (not white)  101 (104)  82 (124)  1126 (285)  1696 (388)  3964 (948) 
Income quintile (1)  43 (146)  39 (216)  483 (422)  784 (567)  2747 (1525) 
Income quintile (2)  70 (124)  99 (148)  35 (337)  2 (519)  369 (1679) 
Income quintile (3)  53 (95)  13 (129)  237 (303)  512 (448)  1196 (1009) 
Income quintile (4)  44 (83)  35 (129)  135 (273)  56 (412)  776 (1089) 
Reading for pleasure (not often)  92 (114)  122 (141)  8 (367)  69 (502)  276 (1179) 
Transportation (passive)  62 (72)  86 (87)  274 (226)  409 (363)  209 (750) 
Number of cars or vans (0)  25 (169)  77 (224)  1121 (549)  1279 (581)  3315 (739) 
Number of cars or vans (1)  53 (83)  75 (100)  564 (231)  682 (368)  2083 (850) 
Number of cars or vans (3+)  4 (164)  53 (224)  518 (496)  655 (809)  2586 (1072) 
Day of the week (weekend)  148 (103)  131 (106)  168 (223)  45 (364)  3023 (1066) 
Season (autumn)  12 (76)  164 (91)  958 (209)  1067 (313)  3155 (774) 
Season (winter)  3 (199)  204 (326)  1377 (561)  1675 (705)  2999 (1124) 
Season (spring)  82 (117)  244 (156)  1242 (333)  2197 (528)  6027 (2224) 
Linear basis term  410 (50)  782 (33)  2050 (54)  2639 (107)  5374 (448) 
for time of the day (weekdays)  
Linear basis term  635 (56)  980 (46)  2620 (83)  3434 (158)  7659 (965) 
for time of the day (weekend)  
Linear basis term for BMI  40 (125)  11 (51)  16 (113)  64 (166)  95 (387) 
Standard deviations (random effects)  
(intercept weekdays)  2969  3923  2882  2769  4897 
(intercept weekend)  3015  3526  2842  2809  5054 
(time of the day weekdays)  2868  3575  2817  2800  5069 
(time of the day weekend)  2867  3376  2940  2858  5017 
Correlations (random effects)  
0.93  0.97  0.73  0.36  0.52  
0.99  0.99  0.94  0.89  0.93  
0.93  0.97  0.71  0.35  0.51  
0.93  0.97  0.73  0.37  0.52  
0.99  0.99  0.93  0.88  0.93  
0.93  0.97  0.72  0.36  0.51  
Standard deviations (smooth terms)  
4136  15215  4114  4343  1722  
8385  14541  6699  2402  515  
2905  4945  7094  2777  181  
PNR  0.11  0.50  0.90  0.95  0.99 
The estimated standard deviations of the random effects show larger variability of individual linear trends (intercepts and temporal slopes) at the median and at (Table 2). The correlation between random effects within weekdays or within weekends is strong, but the crosscorrelation between weekdays and weekends terms is substantially weaker in the second half of the conditional distribution. This means that children tend to have trends of higherintensity activity that are less similar between weekdays and weekends.
Individual trajectories of accelerometer counts for two children of the MCS are plotted in Figure 4. Despite both being white females with similar BMI (), living in a household with income in the lowest quintile and one car, having similar behaviors in terms of reading (often) and transportation (passive), they showed somewhat different daily patterns during summer weekend days. In particular, the conditional distribution was markedly skewed for the girl with identifier M16179P.
7 Conclusion
We have developed a novel additive model for quantile regression when data are clustered. As compared to alternative approaches, ours has unique features, namely the mixedeffects representation of smoothing splines, which in turn leads to automatic smoothing selection, and the ability to model the variancecovariance matrix of the random effects.
As shown in a simulation study, the performance of AQMM was satisfactory despite the minimal tuning of the estimation algorithm. This takes a little burden away from the user who may instead focus their attention on other aspects of the analysis. This can be an asset if the data presents complexities as those illustrated in the MCS accelerometer analysis. In particular, the presence of a large number of regression coefficients and multiple smooth terms hinders the application of computationally intensive smoothing selection (e.g., crossvalidation) to large datasets.
Standard error calculation in AQMM is facilitated by bootstrap. We were able to overcome the relatively large size of the MCS dataset by using an adaptation of the BLB approach (Kleiner et al., 2014). However, the versatility of bootstrap comes at a (computational) price and its application is limited to more central quantiles unless the sample size (i.e., number of clusters) is adequate. Further research is needed to develop accurate ‘samplingfree’ approximations of standard errors in AQMM as well as in LQMM.
Finally, in contrast to estimation based on numerical quadrature (Geraci and Bottai, 2014), random effects estimates in AQMM are a byproduct of the optimization algorithm rather than being calculated post hoc. However, the proposed algorithm can be more demanding in terms of computing time as compared to, say, numerical quadrature or boosting, with the computational bottleneck indeed represented by the estimation of the random effects. For example, it took about two hours to fit a single AQMM using the MCS dataset. While, on the one hand, the large size of this dataset impaired even one of the most refined software for linear mixedeffects models, on the other hand a possible improvement in computing speed of the proposed algorithm is conceivable and is part of future research.
Acknowledgements
This research has been supported by the National Institutes of Health – National Institute of Child Health and Human Development (Grant Number: 1R03HD08480701A1). The cooperation of the participating Millennium Cohort families is gratefully acknowledged as is the contribution of the management team at the Centre for Longitudinal Studies, UCL Institute of Education to collecting and, together with the UK Data Archive and Economic and Social Data Service, making the data available.
Appendix A  Inference
The goal is to maximize the loglikelihood (4.3) with respect to the parameter vector .
Let be the vector of residuals for the th cluster with generic element , and define the corresponding sign vector with
(A.1) 
(The notation above has been simplified since the ’s as well as the ’s should be written as functions of .) We apply the smooth approximation (Madsen and Nielsen, 1993; Chen, 2007) given in equation (6) to the elements of and write
(A.2) 
where is an diagonal matrix with diagonal elements , and are two vectors with elements
and
respectively.
For a more compact notation, let , , , , , , , , and . We now define the function
(A.3) 
The smoothed version of the loglikelihood (4.3) is then given by
(A.4) 
For , we have that .
Since is differentiable with respect to , we can derive the following quantities
The above derivatives can be written more compactly for all clusters as
(A.5)  
(A.6) 
Moreover, let
(A.7) 
be the conditional mode of . A secondorder approximation of around is given by
where , , and
Comments
There are no comments yet.