With recent rising interest in sparse regression for high-dimensional data, least squares regression with regularization—often via lasso penalty (Tibshirani, 1996)—has become a focal point of computing scientists and statisticians in model selection procedures (He et al., 2016; Vidaurre et al., 2013)
. Furthermore, quantile regression has emerged as an alternative to traditional ordinary least squares methods with numerous advantages, including but not limited to robustness against outlying data and more informative insights into the distribution of the response under study(Koenker, 2005).
Oracle model selection theory, introduced by Fan and Li (2001)
, illustrates optimal behaviour during model selection but is limited to the case where error variance is finite. In response,Zou and Yuan (2008)
established composite quantile regression—a method to simultaneously model multiple quantile levels—that maintains desirable oracle properties even in the case of infinite error variance. Beyond oracle model selection and the simultaneous modelling of multiple quantile levels, composite quantile regression also achieves a lower variance on estimated effects relative to quantile regression. These properties of composite quantile regression have proven attractive to many researchers, who have widely applied this technique in improving the processing capabilities of artificial neural networks(Xu et al., 2017), providing an alternative to local polynomial regression (Kai et al., 2010), and in Harris chain stochastic processes (Li and Li, 2016).
Applying existing optimization algorithms to (composite) quantile regression requires a non-trivial reformulation of the problem due to the non-linearity and non-differentiability of the loss and regularization terms of the objective. The well-known quantreg package for R (Koenker, 2017) uses an interior point (IP) approach for quantile and composite quantile regression with the option of (lasso) regularization for the former and no regularization options for the latter. Although advanced IP algorithms in quantreg, such as the one using prediction-correction (Mehrotra, 1992) for non-regularized quantile regression, have greatly improved upon earlier attempts using simplex methods, the time spent on matrix inversion in IP approaches (Chen and Wei, 2005) motivates us to seek faster algorithms for quantile and composite quantile regression, particularly for high-dimensional data where regularization is required. In addition, following the conjectures of Fan and Li (2001), Zou (2006) showed lasso variable selection—currently the most commonly-implemented penalty for quantile regression—to be inconsistent in certain situations and presented adaptive lasso regularization as a solution. Our work in the present paper is thus motivated by both a search for faster quantile regression algorithms as well as the lack of publicly-available methods for adaptive-lasso regularized quantile and composite quantile regression, particularly for high-dimensional data.
We present the cqrReg package (Gao and Kong, 2015) for composite quantile regression with regularization in R. This package performs computations in C++ and links back to R via the Rcpp (Eddelbuettel and François, 2011) and RcppArmadillo (Eddelbuettel and Sanderson, 2014) packages for increased computational efficiency. cqrReg is novel in its implementation of quantile regression, composite quantile regression, and corresponding versions regularized by an adaptive lasso penalty using three different algorithms. An alternating direction method of multipliers (ADMM) approach breaks up the optimization problem into simpler, more easily-solvable portions (Boyd et al., 2011); a majorize-minimization (MM) approach majorizes both the loss and regularization functions into differentiable smooth functions and subsequently minimizes these majorizations (Hunter and Lange, 2000); and a coordinate descent (CD) method uses observations in a greedy algorithm to iteratively select and update individual model parameters (Wu and Lange, 2008). For the sake of comparison, we also present an IP formulation of the problem, which seeks to minimize both loss and regularization functions after starting within rather than on the boundary of the feasible set (Koenker, 2005). In later numerical simulations, we use the advanced IP methods present in the quantreg package.
Numerical simulation results suggest that our approaches generally improve upon quantreg’s computation time with roughly the same level of error for the wide range of quantile regression problems considered. We find that the MM approach to non-regularized (composite) quantile regression greatly outperforms the other three methods in terms of computation time, and that the CD method excels in regularized (composite) quantile regression with high-dimensional data. Furthermore, our ADMM approach was competitive in all simulations performed and holds the promise of easier parallelization.
The rest of this article is structured as follows. Section 2 presents quantile regression, starting with relevant notation in Subsection 2.1, followed by the description of our approaches to quantile regression using the ADMM, MM, and CD algorithms in Subsections 2.2 through 2.4, respectively. Section 3 continues with composite quantile regression, including relevant notation and commentary on the extension from quantile to composite quantile regression for our ADMM, MM, and CD methods. Numerical simulations with results are presented in Section 4 and discussed in Section 5.
2 Quantile Regression
In this section, we present the proposed ADMM, MM, and CD algorithms for quantile regression with adaptive lasso regularization. Since non-regularized quantile regression can be seen as a special case of the regularized version, we do not include the former here. Interested readers are referred to the supplementary appendix for implementations of the non-regularized problems and further details (omitted for brevity) on our proposed methods. For completeness in the upcoming simulations, a basic IP formulation is also given in the appendix.
2.1 Background and Notation
We first introduce the necessary background and notation to be used throughout this paper regarding quantile regression loss functions, both with and without adaptive lasso regularization. For a quantile level, we define the quantile function for any real by , where and . Given a design matrix , and corresponding coefficient vector and intercept , define the quantile regression problem (Koenker and Bassett, 1978; Koenker, 2005) with adaptive lasso penalty (Wu and Liu, 2009; Zou, 2006) for a desired quantile level by
where is a regularization parameter, is the adaptive lasso penalty, and is the solution (without intercept terms) obtained from non-regularized quantile regression—that is, the solution to the problem with .
We define the residuals for quantile regression by , for . For ease of notation throughout this section, we sometimes assume that a design matrix has an appropriate column for the intercept term of the model. Where intercepts are accounted for in the design matrix, the parameter vector will be taken to include the corresponding intercept terms such that . This will be made clear by the dimension of . Throughout this paper, will always refer to the number of covariate parameters, and for will always refer to a covariate effect and never an intercept term.
2.2 Alternating Direction Method of Multipliers Algorithm
Although developed in the 1960s and 1970s (Hestenes, 1969; Gabay and Mercier, 1976), interest in the ADMM algorithm has been recently renewed in the findings of Boyd et al. (2011) and Lin et al. (2010). These studies demonstrate the ADMM algorithm’s relative efficiency in solving optimization problems with large data sets, particularly when non-smooth terms are present in the objective function. This method has thus found notable use in quantile regression research where the quantile loss and, if present, regularization terms, are often non-smooth (Boyd et al., 2011; Kong et al., 2015; Zhang et al., 2017). For brevity, a general formulation of the ADMM algorithm is available in the supplementary appendix. We apply the ADMM algorithm (Boyd et al., 2011) by reformulating regularized quantile regression as the convex optimization problem
where is a vector of residuals, is a regularization parameter, and is the solution obtained from quantile regression (without the intercept term, as usual). We assume that the intercept term is accounted for in both and , and solve this problem using the ADMM iteration scheme (Boyd et al., 2011)
where is the rescaled Lagrange multiplier and is a penalty parameter. For reference, is chosen to be 1.2 by Boyd et al. (2011). The update for can be written in a closed form as where and, for real , the function is defined component-wise via . Similarly, the update step for , while not closed, can be viewed as a least-squares optimization problem with adaptive lasso penalty. We implement existing numerical methods to solve this problem and update .
A generic stopping condition for the algorithm can be defined in terms of the primal and dual residuals and , respectively, with the stopping conditions and . Let and be and with the intercept columns and terms removed, respectively, and a vector of intercepts . In this regularized setting, we obtain from the general ADMM algorithm that
2.3 Majorize-Minimization Algorithm
The use of majorizing functions to solve minimization problems has been well-studied in the statistical literature for many years since Ortega and Rheinboldt (1970). It was not until a later time, however, that the general MM framework was put forward by Hunter and Lange (2000). In general, MM can refer to majorize-minimization or minorize-maximization, depending on whether the problem at hand is a minimization or maximization problem, respectively. MM algorithms operate iteratively by constructing, using a solution for the current iteration, an auxiliary function that will simultaneously optimize the original concave objective . In the case of a minimization problem, such an auxiliary function is called a majorizer and must satisfy for all of interest and
. Arguably, the most well-known application of an MM method is in the expectation-maximization (EM) algorithm(Dempster et al., 1976) for maximum likelihood estimation. MM has also been applied in various areas of research, including regression, survival analysis, discriminant analysis, and quantile regression (Hunter and Lange, 2004). We use the MM algorithm developed by Hunter and Lange (2000) and Hunter and Li (2005) to solve the quantile regression problem with adaptive lasso regularization.
To approach quantile regression, first construct a function based on some perturbation parameter that will be used to approximate the fidelity portion of the objective. For any real , define , and the subsequent fidelity approximation by . At the -th iteration of the algorithm, for each current residual value , we have that is majorized by the quadratic function
for some solvable constant that satisfies the equation . We now address the penalty term . Given , , and an initial value for , we can locally approximate the penalty as a quadratic function. This yields a majorizer of the entire adaptive lasso-regularized quantile regression objective (Hunter and Li, 2005),
In practice, for the -th iteration of the algorithm, given an updated value for , we generate and minimize a new majorized quadratic function using a Newton-Raphson iterative method. The argument minimum is taken as an updated value for and can be used to decide when to terminate the algorithm: cqrReg uses tolerance .
2.4 Coordinate Descent Algorithm
Coordinate descent (CD) algorithms are iterative procedures that generally fix some of the values of the variable vector in an optimization problem and subsequently solve the resulting subproblem in terms of the unfixed components. CD methods have a long-standing history (Ortega and Rheinboldt, 1970) during which their convergence properties have been studied (Luo and Tseng, 1992; Tseng, 2001)
. The most simple CD algorithms allow for exactly one unfixed variable per iteration so as to search for a subproblem solution along a line, while others will search along a hyperplane by allowing multiple unfixed components. Most implementations use the latter in a block coordinate descent method. CD methods have been developed extensively, particularly for non-differentiable, non-convex objective functions, permitting the use of regularization functions such as lasso () and ridge () penalties (Tseng, 2001; Friedman et al., 2010).
To implement quantile regression with adaptive lasso regularization, we use an extended version of the greedy CD method put forward by Edgeworth and, more recently, further developed by Wu and Lange (2008). This requires us to reformulate the quantile objective function, as shown below. In each iteration, for fixed , replace by the sampling quantile at the desired level of the values for : this will necessarily drive the value of the objective downwards. Define for . For each element for of , rewrite the loss function as
and apply the CD algorithm. For each fixed , define if and if . We sort , for , and update to the value of the order statistic with index satisfying both
where if and if and corresponds to . In other words, using the weights , the selected is the weighted median of all (for the fixed value of ). At the end of each iteration, check for the convergence of using the selected stopping criteria: cqrReg uses an absolute value difference threshold of .
3 Composite Quantile Regression
In this section, we present an extension from quantile to composite quantile regression for the proposed ADMM, MM, and CD algorithms. We again only show results for the case with adaptive lasso regularization. Readers interested in the non-regularized case are referred to the supplementary appendix where more details and a similar extension for a basic IP formulation are given. With regards to available quantreg methods, we note that non-regularized composite quantile regression has only recently been implemented using an IP algorithm, and that a regularized version is currently not available in that package.
Composite quantile regression (Zou and Yuan, 2008) uses a sequence of quantile levels , for some fixed , that we wish to simultaneously model. The composite quantile regression problem (Koenker, 2005) with adaptive lasso regularization (Wu and Liu, 2009; Zou, 2006) is defined by
where is a regularization parameter, is analogously defined, and is the solution (without intercepts) to the non-regularized composite quantile regression problem without regularization—that is, where . To extend the residual notation defined before, let , for and .
The extension from quantile to composite quantile regression is relatively straightforward: indeed, changes need only be made to accommodate additional quantile levels and intercept terms. In particular, since the composite quantile case only adds more intercept parameters, the penalty term remains unchanged. For explicit details on our methods for regularized composite quantile regression in the ADMM, MM, and CD approaches, refer to the supplementary appendix.
To extend the ADMM method, we generate a new design matrix by “stacking” the design matrices for each quantile level and adjusting all input accordingly. Written formally,
, , , ,
where, for example, denotes the matrix with rows . The methods presented in Subsection 2.2 for quantile regression then apply after replacing , , , and with , , , and , respectively. After replacement, the optimization problem becomes
Refer to the supplementary appendix for the explicit update scheme and residuals.
The extension of the remaining two methods is similar, although requiring a slight change in the objective function. For the CD method, we simply adjust our reformulation of the objective, for , to include a second summation for the additional quantile levels as
where is analogous to defined previously. The MM approach is similarly extended, yielding a final majorizer of the form
4 Numerical Simulations
In this section, we evaluate the performance of the ADMM, MM, and CD methods presented in this paper against that of the IP methods in quantreg. Because quantreg does not have any method for regularized composite quantile regression, such a comparison is not possible in that setting. Lasso regularization is used in place of adaptive lasso regularization for the IP method as only the former is available in quantreg. Throughout this section, data is generated according to the model
are i.i.d. standard normal random variables. We use a convergence threshold ofto define our stopping criteria throughout.
We first focus on parameter estimation rather than variable selection, and consider cases with variables and observations in non-regularized quantile and composite quantile regression. In each simulation, the true value of each for is uniform-randomly generated from the interval . In the quantile regression case, we set quantile level , and in the composite quantile setting, we use levels . The accompanying Tables 1 and 2 present the performance of each method, averaged over 50 simulations.
We next consider variable selection for high-dimensional data using observations and varying from to , as shown in the accompanying tables. The performance of each algorithm is summarized by the average number of false predictors selected, the average number of true predictors selected, and the average computation time in seconds over 25 replications. Simulation results in Table 3 are for regularized quantile regression with quantile level : here, the ADMM, MM, and CD methods use adaptive lasso regularization as described in previous sections, while the IP method uses lasso regularization as available in quantreg. Table 4 gives results based on composite quantile regression with adaptive lasso regularization using quantile levels : we do not make a comparison against an IP approach here, however, as an appropriate method is not available in quantreg.
5 Discussion and Conclusions
In this paper we have presented the cqrReg package for quantile and composite quantile regression and variable selection in R. Motivated by the lack of variety in algorithms for quantile and composite quantile regression, both with and without adaptive lasso regularization, and a desire to improve run times over existing interior point (IP) methods, we reformulated four types of quantile regression problems and implemented solutions using three algorithms. We compared our methods to existing IP algorithms available in the quantreg package through simulation studies.
In the non-regularized quantile regression setting, we do not observe substantial differences in the average error between methods; the same is true of run time except for the MM algorithm, which performs considerably better than the other three methods in this setting. In non-regularized composite quantile regression, however, differences between the methods in terms of error are more apparent, as the IP method has larger average error than the ADMM, MM, and CD approaches, while MM and CD are faster and ADMM slower than the IP algorithm available in quantreg. The results so far suggest that the MM algorithm is the best-suited for non-regularized (composite) quantile regression among the four methods tested, especially for data sets with large and relatively small . In regularized quantile regression, all methods perform similarly in terms of variable selection, but CD and ADMM show clear superiority in run time, particularly relative to the IP and MM methods when is large. In the case of regularized composite quantile regression, CD and ADMM display run time superiority over MM. Furthermore, MM shows a tendency to select irrelevant variables, likely due to the algorithm’s matrix inversion and selection of an approximating parameter. No IP method is available from quantreg in this case. This second set of results suggests that our CD approach is best-suited for regularized (composite) quantile regression and variable selection among the three methods tested.
Overall, our cqrReg package provides reliable and efficient algorithms to estimate solutions to quantile and composite quantile regression problems, including versions regularized by an adaptive lasso penalty. Our methods widen the variety of algorithms available for quantile and composite quantile regression and greatly improves upon the run time of existing advanced IP methods, particularly for large or high-dimensional data sets. While our ADMM method was competitive but unable to beat our other MM and CD approaches, ADMM is amenable to parallelization and is a promising method that naturally lends itself to distributed computing to handle data that is both high-dimensional and extremely large in volume.
Drs. Linglong Kong, Bei Jiang, and Di Niu are supported in part by the Natural Sciences and Engineering Research Council of Canada (NSERC). Jueyu Gao acknowledges the supervision of Drs. Linglong Kong and Edit Gombay during his graduate studies.
Appendix A cqrReg Methods for Quantile and Composite Quantile Regression Without Adaptive Lasso Regularization
This supplementary appendix is structured as follows. Section A.1 presents details omitted for brevity from the main text on the methods we implement in cqrReg for solving the standard quantile regression problem without regularization via alternating direction method of multipliers (ADMM), majorize-minimization (MM), and coordinate descent (CD) algorithms. Furthermore, for the sake of comparison, we also introduce a basic interior point (IP) approach. Section A.2 gives details not present in the main article on the generalization from quantile to composite quantile regression, again without regularization.
a.1 Non-Regularized Quantile Regression
The following Subsections A.1.1 through A.1.3 give details on our implementation of non-regularized quantile regression in cqrReg through the ADMM, MM, and CD algorithms. We place particular emphasis on the ADMM approach, whose general setup we review first. Subsection A.1.4 introduces a basic IP method and a reformulation of the quantile regression problem accessible to the Rmosek optimization package (Friberg, 2013). We use the notation presented in Section 2 of the main text throughout.
a.1.1 Alternating Direction Method of Multipliers Algorithm
Before proceeding with an application to quantile regression, we review the general ADMM algorithm, which decomposes a given additively separable convex optimization problem into a number of sub-convex optimization problems. The general formulation of the ADMM problem is
where and are convex, real-valued functions of vectors and , and are matrices, and is a constant vector. The augmented Lagrangian (Powell, 1967) of the above problem is written as
where is a tuning parameter. Setting and , we can obtain the (more convenient) scaled augmented Lagrangian
The ADMM method optimizes the scaled augmented Lagrangian using the iterative scheme
A generic stopping condition for the algorithm can be defined in terms of the primal and dual residuals, given by and , respectively. The program can be made to terminate if both
where and are the length of and , respectively. In our applications, we set and .
We apply the ADMM algorithm (Boyd et al., 2011) by reformulating quantile regression as the convex optimization problem
where is a vector of residuals. Note that the intercept term is accounted for in both and . Using the general procedure of ADMM, taking and as a function of to be the entire objective, we obtain the iterative scheme
where is the rescaled Lagrange multiplier and is a penalty parameter. The update for can be written in a closed form as , where and, for real , the function is defined component-wise via . For reference, is chosen to be 1.2 by Boyd et al. (2011). Furthermore, in the quantile regression setting, we have that
a.1.2 Majorize-Minimization Algorithm
We use the MM algorithm developed by Hunter and Lange (2000) and Hunter and Li (2005) to solve the quantile regression problem without regularization. Our approach is exactly the same as in Subsection 2.3 of the main text, but we instead ignore the majorization of the penalty term in the quantile regression objective. Construct a function based on some perturbation parameter that will be used to approximate the quantile regression objective . For any real , define , and the subsequent approximation of by . At the -th iteration of the algorithm, for each current residual value , we have that is majorized by the quadratic function
for some solvable constant that satisfies the equation . The MM algorithm minimizes the majorizer of , namely,
with the argument minimum taken as the updated value of . In practice, for the -th iteration of the algorithm, given an updated value for , we generate and minimize a new majorized quadratic function and implement a Newton-Raphson iterative method to obtain an updated value for .
a.1.3 Coordinate Descent Algorithm
To implement quantile regression, we use an extended version of the greedy CD method in cqrReg put forward by Edgeworth and, more recently, further developed by Wu and Lange (2008). In each iteration, for fixed , replace by the sampling quantile at the desired level of the values for : this will necessarily drive the value of the objective downwards. Define for . For each element for of , rewrite the loss function as
in order to use the CD algorithm. For each fixed , sort the values of
for and update to be the order statistic with index satisfying both
where corresponds to . In other words, using the weights , the selected is the weighted median of all (for the fixed value of ). At the end of each iteration, we check for convergence of and stop the algorithm using an absolute value difference threshold of .
a.1.4 Interior Point Algorithm
Interior point (IP) methods generally reach an optimal solution by travelling within rather than on the boundary of the feasible set. Though studied as early as the 1950s and 1960s, IP methods arguably first gained widespread interest with the landmark paper by Karmarkar (1984)
, who proposed an efficient, polynomial-time IP algorithm for linear programs with performance rivalling the existing simplex method.Nesterov and Nemirovskii (1994) later extended these results to a range of convex optimization problems while maintaining polynomial time. In the present day, advanced IP methods and code for both linear and non-linear programs are widely available and well-studied in the literature (Roos et al., 2006). IP algorithms have also received considerable attention and success in applications to non-linear, non-convex optimization problems (Byrd et al., 1999).
We can implement quantile regression using an IP algorithm by reformulating the optimization problem as a linear program and making use of existing optimization packages such as Rmosek (Friberg, 2013). Rmosek can implement an IP algorithm to solve problems of the form
where is a constraint matrix; and the objective coefficients and constant; the lower and upper constraint bounds; the lower and upper variable bounds; and where, for notational simplicity, is taken to mean component-wise comparison of vectors. Alternatively, other R packages such as quantreg exist specifically for quantile regression and make use of IP methods. The IP approach for quantile regression in quantreg is based on the method of Portnoy and Koenker (1997), with recent modifications including the prediction-correction algorithm of Mehrotra (1992). Lasso-penalized quantile regression in quantreg uses a Frisch-Newton method.
Let be a vector of the positive and negative parts, respectively, of the residuals , and a vector of parameters including the intercept. For notational simplicity, is taken to mean component-wise comparison of vectors of equal length. The quantile regression problem without regularization can be formulated for use in existing IP optimization routines such as Rmosek via
As an aside, to incorporate an adaptive lasso penalty into the problem, we can rewrite the problem as a linear program accessible to existing IP routines via
a.2 Composite Quantile Regression Without Regularization
This appendix section shows details of the extension from quantile to composite quantile regression without regularization. Subsections A.2.1, A.2.2, and A.2.3 extend the above non-regularized quantile regression procedures using ADMM, MM, and CD algorithms, respectively. Subsection A.2.4 formulates the problem for use in Rmosek (Friberg, 2013) or other IP methods for linear programs. We use the notation presented in Section 2 of the main text throughout.
a.2.1 Alternating Direction Method of Multipliers Algorithm
Throughout this subsection, we use the notation for , , , and defined in Section 3 of the main text. Written in the ADMM form, the composite quantile regression problem can be expressed as
where we assume that the intercept term is accounted for in both and . The ADMM approach is applied in exactly the same way as in Subsection A.1.1, yielding the iterative update scheme
where , and residuals
As previously, a generic stopping condition requiring and for termination can be imposed.
a.2.2 Majorize-Minimization Algorithm
An extension of the MM algorithm from quantile to composite quantile regression simply involves the incorporation of additional quantile levels. Indeed, we use the same function to approximate the composite quantile regression objective via