An update on statistical boosting in biomedicine

02/27/2017
by   Andreas Mayr, et al.
FAU
0

Statistical boosting algorithms have triggered a lot of research during the last decade. They combine a powerful machine-learning approach with classical statistical modelling, offering various practical advantages like automated variable selection and implicit regularization of effect estimates. They are extremely flexible, as the underlying base-learners (regression functions defining the type of effect for the explanatory variables) can be combined with any kind of loss function (target function to be optimized, defining the type of regression setting). In this review article, we highlight the most recent methodological developments on statistical boosting regarding variable selection, functional regression and advanced time-to-event modelling. Additionally, we provide a short overview on relevant applications of statistical boosting in biomedicine.

READ FULL TEXT VIEW PDF

Authors

page 1

page 2

page 3

page 4

02/03/2022

Deselection of Base-Learners for Statistical Boosting – with an Application to Distributional Regression

We present a new procedure for enhanced variable selection for component...
12/31/2017

Estimation and Inference of Treatment Effects with L_2-Boosting in High-Dimensional Settings

Boosting algorithms are very popular in Machine Learning and have proven...
02/06/2020

Robust Boosting for Regression Problems

The gradient boosting algorithm constructs a regression estimator using ...
07/01/2019

State-of-the-art in selection of variables and functional forms in multivariable analysis -- outstanding issues

How to select variables and identify functional forms for continuous var...
09/24/2019

The column measure and Gradient-Free Gradient Boosting

Sparse model selection by structural risk minimization leads to a set of...
05/04/2018

Selective Inference for L_2-Boosting

We review several recently proposed post-selection inference frameworks ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Statistical boosting algorithms are one of the advanced methods in the toolbox of a modern statistician or data scientist [1]

. They offer multiple advantages in the presence of high-dimensional data as they can deal with more potential candidate variables than observations (

situations) while still yielding classical statistical models with well-known interpretability [2, 3]. Key features in this context are automated variable selection and model choice [4, 5].

The field of research is methodologically situated between the world of statistics and computer science. They bridge the gap between two rather different point of views on how to gather information from data [6]: on the one hand, there is the classical statistical modelling community that focuses on models describing and explaining the outcome in order to find an approximation to the underlying stochastic data generation process. On the other hand, there is the machine learning community that focuses primarily on algorithmic models predicting the outcome while treating the nature of the underlying process as unknown. Statistical boosting algorithms have their roots in machine learning [7] but were later adapted in order to estimate classical statistical models [8, 9]. A pivotal aspect of these algorithms is that they incorporate data-driven variable selection and shrinkage of effect estimates similar to the one of classical penalized regression [10].

In a review some years ago [1]

, we highlighted this evolution of boosting from machine-learning to statistical modelling. Furthermore, we emphasized the similarity of two boosting approaches – gradient boosting

[2] and likelihood-based boosting [3] – introducing statistical boosting as a generic term for these kind of algorithms. Throughout this article, we will use this term to reflect both approaches.

The earlier review [1] was accompanied by a second article [11] highlighting the multiple variants the basic algorithms have been extended towards (i) enhanced variable selection properties, (ii) new types of predictor effects, and (iii) new regression settings. The substantial new methodological developments on statistical boosting algorithms throughout the last years (e.g., stability selection [12]), opening the door for the growing community to new model classes and frameworks (e.g., joint models [13] and functional data [14]), make it necessary to provide an update on the available extensions.

This article is structured as follows: In Section 2 we shortly highlight both basic structure and properties of statistical boosting algorithms and point to their connections to classical penalization approaches like the lasso. In Section 3 we focus on new developments regarding variable selection, which can be also combined with boosted functional regression models presented in Section 4. Section 5 focuses on advanced survival models before we briefly summarize in Section 6 what other relevant developments and applications have been proposed for the framework of statistical boosting.

2 Statistical boosting

2.1 From machine learning to statistical models

The original boosting concept by Schapire [15] and Freund [7]

emerged from the field of supervised learning, focusing on

boosting

the accuracy of weak classifiers (

base-learners) by iteratively applying them to re-weighted data to get stronger results.

Even if the base-learners individually only slightly outperform random guessing the combined ensemble solution can often be boosted to a perfect classification [16]. The introduction of AdaBoost [17] was the breakthrough for boosting in the field of supervised machine learning, allegedly leading Leo Breiman to praise its performance: Boosting is the best off-the-shelf classifier in the world [18].

The main target of classical machine-learning approaches is predicting observations of the outcome given one or more input variables . The estimation of the prediction or generalization function is based on an observed sample . However, since the underlying nature of the data generating process is treated as unknown, the focus is not on quantifying or describing this process, but solely on predicting for new observations as accurately as possible.

As a consequence, many machine-learning approaches (also including the original AdaBoost with trees or stumps) should mainly be seen as black box prediction schemes. Although typically yielding accurate predictions [19], they do not offer much insight into the structure of the relationship between explanatory variables and the outcome .

Statistical regression models on the other hand, particularly aim at describing and explaining the underlying relationship in a structured way. The impact of single explanatory variables can not only be quantified in terms of variable importance measures [20, 21], but the actual effect of these variables is interpretable. The work of Friedman et al. [8, 9] laid the groundwork to understand the concept of boosting from a statistical perspective and to adapt the general idea in order to estimate statistical models.

2.2 General model structure

The aim of statistical boosting algorithms is to estimate and select the effects in structured additive regression models. The most important model class are generalized additive models (’GAM’, [22]

), where the conditional distribution of the response variable is assumed to follow an exponential family distribution. Then, the expected response is modeled given the observed value

of one or more explanatory variables using a link-function as

In the typical case of multiple explanatory variables, the function is called additive predictor and consists of the additive effects of the single predictors,

(1)

where represents a common intercept and the functions , are the partial effects of the variables . The generic notation may comprise different types of predictor effects such as classical linear effects , smooth non-linear effects constructed via regression splines, spatial effects or random effects of the explanatory variable , to name but a few. In statistical boosting algorithms, the different partial effects are estimated by separate base-learners (component-wise boosting, [2]) which are typically simple regression-type prediction functions.

  1. Initialization

    1. Start with iteration counter . Initialize the additive predictor with an offset value. Specify a set of prediction functions as base-learners ; typically each base-learner is a regression function incorporating one possible candidate variable.

  2. Component-wise fitting of base-learners

    1. Set iteration counter .

    2. Fit the base-learners , , one-by-one:

      Gradient boosting

      Base-learners are fitted to the negative gradient vector of the loss function (e.g. the log-likelihood), evaluated at the current additive predictor

      . To ensure small steps, the base-learner fits are multiplied by a small step-length factor , :

      Likelihood-based boosting

      Base-learners are estimated via maximizing the overall likelihood, using one step of Fisher scoring with the current additive predictor as offset. To ensure small steps, a penalty term is attached to the likelihood.

  3. Update best performing component

    1. Select the best performing base-learner :

      Gradient boosting

      Based on the smallest residual sum of squares with respect to the negative gradient vector.

      Likelihood-based boosting

      Based on the largest overall likelihood after the update.

    2. Update the additive predictor via the corresponding base-learner:

  4. Iteration

    1. Iterate steps (2) to (5) until .

Box 1 The structure of statistical boosting algorithms.

2.3 Gradient boosting

Gradient boosting [2, 8] is one of the two important approaches in the context of statistical boosting. For a generic overview on the structure of statistical boosting algorithms see Box 1.

In gradient boosting, the iterative procedure fits the base-learners one-by-one to the negative gradient of the loss function , evaluated at the previous iteration:

(2)

The loss function describes the discrepancy between the observed outcome and the additive predictor and is the target function that should be minimized to get an optimal fit for

. In case of GAMs, the loss function is typically the negative log-likelihood of the corresponding exponential family. For Gaussian distributed outcomes, this reduces to the

loss , where the gradient vector is simply the vector of residuals from the previous iteration and boosting hence corresponds to refitting of residuals.

In each boosting iteration, only the best-fitting base-learner is selected based on the residual sum of squares of the base-learner fit

(3)

Only this base-learner is added to the current additive predictor . In order to ensure small updates, only a small proportion of the base-learner fit (typically the step length is [2]) is actually added. Note that the base-learner can be selected and updated various times; the partial effect of variable is the sum of all corresponding base-learner that had been selected:

This component-wise procedure of fitting the base-learners one by one to the current gradient of the loss function can be described as gradient descent in function space [23], where the function space is spanned by the base-learners. The algorithm effectively optimizes the loss function step-by-step, eventually converging to the minimum. In order to avoid overfitting and to ensure variable selection, the algorithm is typically stopped before convergence (based on predictive performance evaluated via cross-validation or resampling [24]), which leads to an implicit penalization [25].

Gradient boosting is implemented in the add-on package mboost [26] for the open source programming environment R [27], providing a large number of pre-implemented loss functions for various regression settings, as well as different base-learners to represent various types of effects (see [28] for an overview). Recent changes in the software, which were introduced after the comprehensive mboost tutorial [28] are provided as Appendix A.

2.4 Likelihood-based boosting

Likelihood-based boosting [3, 29] is the other general approach in the framework of statistical boosting algorithms. It follows a very similar structure as gradient boosting (see Box 1), although both approaches only coincide in special cases such as classical Gaussian regression via the loss [1, 30]. In contrast to gradient boosting, the base-learners are directly estimated via optimizing the overall likelihood, using the additive predictor from the previous iteration as offset. In case of the loss, this has a similar effect than refitting the residuals.

In every step, the algorithm hence optimizes regression models as base-learners one-by-one by maximizing the likelihood (using one step Fisher scoring), selecting only the base-learner which leads to the largest increase in the likelihood. In order to obtain small boosting steps, a quadratic penalty term is attached to this likelihood. This has a similar effect as multiplying the fitted base-learner by a small step-length factor as in gradient boosting.

Likelihood-based boosting for generalized linear and additive regression models is provided by the R add-on package GAMBoost [31], and an adapted version for boosting Cox regression is provided with CoxBoost [32]. For a comparison of both statistical boosting approaches, i.e., likelihood-based and gradient boosting in case of Cox proportional hazard models, we refer to [33].

2.5 Connections to -regularization

Statistical boosting algorithms result in regularized models with shrinked effect estimates although they only apply implicit penalization [25] by stopping the algorithm before convergence. By performing regularization without the use of an explicit penalty term, boosting algorithms clearly differ from other direct regularization techniques such as the lasso [34]. However, both approaches sometimes result in very similar models after being tuned to a comparable degree of regularization [10].

This close connection has been first noted between the lasso and forward stagewise regression, which can be viewed as special case of the gradient boosting algorithm (Box 1), and led, along with the development of least angle regression (LARS), to the formulation of the positive cone condition (PCC) [35].

If this condition holds, LARS, lasso and forward stagewise regression coincide. Figuratively speaking, the PCC requires that all coefficient estimates monotonically increase or decrease with relaxing degree of regularization and applies, for example, to the case of low-dimensional settings with orthogonal . It should be noted that the PCC is connected to the diagonal dominance condition for the inverse covariance matrix of , which allows for a more convenient way to investigate the equivalence of these approaches in practice [36, 37].

Given that the solution of the lasso is optimal with respect to the -norm of the coefficient vector, these findings led to the notion of boosting as some “sort of -sparse” regularization technique [38], but it remained unclear which optimality constraints possibly apply to forward stagewise regression if the PCC is violated.

By further extending with a negative version of each variable and enforcing only positive updates in each iteration, Hastie et al. [39] demonstrated that forward stagewise regression always approximates the solution path of a similarly modified version of the lasso. From this perspective, they showed that forward stagewise regression minimizes the loss function subject to the -arc-length

This means that the travelled path of the coefficients is penalized (allowing as little overall changes in the coefficients as possible, regardless of their direction), whereas the -norm considers only the absolute sum of the current set of estimates. In the same article, Hastie et al. [39]

further showed that these properties hold for general convex loss functions and therefore apply not only to forward stagewise regression but for the more general gradient boosting method (in case of logistic regression models as well as for many other generalized linear regression settings).

The consequence of these differing optimization constraints can be observed in the presence of strong collinearity, where the lasso estimates tend to be very unstable regarding different degrees of regularization while boosting approaches avoid too many changes in the coefficients as they consider the overall travelled path [10].

It has to be acknowledged, however, that direct regularization approaches as the lasso are applied more often in practice [38]. Statistical boosting, on the other hand, is far more flexible due to its modular nature allowing to combine any base-learner with any type of loss-function [10, 38].

3 Enhanced variable selection

Early stopping of statistical boosting algorithms via cross-validation approaches plays a vital role to ensure a sparse model with optimal prediction performance on new data. Resampling, i.e., random sampling of the data drawn without replacement, tends to result in sparser models compared to other sampling schemes [24], including the popular bootstrap [40]

. By using base-learners of comparable complexity (in terms of degrees of freedom) selection bias can be strongly reduced

[4]. The resulting models have optimal prediction accuracy on the test data. Yet, despite regularization the final models are often relatively rich [24].

3.1 Stability selection

Meinshausen and Bühlmann [41] proposed a generic approach called stability selection to further refine the models and enhance sparsity. This approach was then transferred to boosting [12]. In general, stability selection can be combined with any variable selection approach and is especially useful for high-dimensional data with many potential predictors. To assess how stable the selection of a variable is, random subsets that comprise half of the data are drawn. On each of these subsets, the model is fitted until base-learners are selected. Usually, subsets are sufficient. Computing the relative frequencies of random subsamples in which specific base-learners were selected give a notion of how stable the selection is with respect to perturbations of the data. Base-learners are considered to be of importance if the selection frequency exceeds a pre-specified threshold level .

Meinshausen and Bühlmann [41] showed that this approach controls the per-family error rate (PFER), i.e., it provides an upper bound for the expected number of false positive selections ():

(4)

where is the number of base-learners. This upper bound is rather conservative and hence was further refined by Shah and Samworth [42] for specific assumptions on the distribution of the selection frequencies. Stability selection with all available error bounds is implemented for a variety of modelling techniques in the R package stabs [43].

An important issue is the choice of the hyper-parameters of stability selection. The choice of a fix value of should be made such that it is large enough to select all hypothetically influential variables [12, 44]. A sensible value for should usually be smaller or equal to the number of base-learners selected via early stopping with cross-validation.

In general, the size of is of minor importance if it is in a sensible range. With a fixed the threshold for stable effects either the threshold for can be chosen additionally or, as can be seen from Equation (4) using equality, the upper bound for the PFER can be pre-specified and the threshold can be derived accordingly. The latter would be the preferred choice if error control is of major importance, the former if error control is just considered a by-product (see e.g., [44]

). For an interpretation of the PFER, particularly with regard to standard error rates such as the per-comparison error rate or the family-wise error rate, we refer to Hofner

et al. [12]. Note that for a fixed , it is computationally very easy to change any of the other two parameters ( or the upper bound for the PFER) as the resampling results can be reused [12].

Please note that base-learners selected via stability selection might not reflect any model which can be derived with a specific penalty parameter using the original modelling approach. This means that for boosting, no value might exist that results in a model with the stably selected base-learners; the provided set of stable base-learners is a fundamentally new solution.

3.2 Extension and application of boosting with stability selection

Variable selection is especially important in high-dimensional gene expression data and other large scale biomedical data sources. Recently, stability selection with boosting was successfully applied to select a small number of informative biomarkers for survival of breast cancer patients [44]. The model was derived based on a novel boosting approach that optimizes the concordance index [45, 46]. Hence, the resulting prediction rule was optimal with respect to its ability to discriminate between patients with longer and shorter survival, i.e., its discriminatory power.

Thomas et al. [47] derived a modified algorithm for boosted generalized additive models for location, scale and shape (GAMLSS, [48]) to allow a combination of this very flexible model class with stability selection. The basic idea of GAMLSS is to model all parameters of the conditional distribution by their own additive predictor and associated link function. Extensive simulation studies showed that the new fitting algorithm leads to comparable models as the previous algorithm [49, 50] but is superior regarding the computational speed, especially in combination with cross-validation approaches. Furthermore, simulations showed that this algorithm can be successfully combined with stability selection to select sparser models identifying a smaller subset of truly informative variables from high-dimensional data. The current algorithm is implemented in the R add-on package gamboostLSS [51], the modified version is currently available on GitHub [52].

3.3 Further approaches for sparse models

In order to construct risk prediction signatures on molecular data, such as DNA methylation, Sariyar et al. [53] proposed an adaptive likelihood-based boosting algorithm. The authors included a step size modification factor which represents an additional tuning parameter, adaptively controlling the size of the updates. In case of sparse settings, the approach decreases shrinkage of effect estimates (by using a larger step-length) leading to a smaller bias. In settings with larger numbers of informative variables, the approach allows to fit models with lower degree of sparsity when necessary by smaller updates. The modification factor has to be selected together with via cross-validation or resampling on a two-dimensional grid.

Zhang et al. [54] argue that variable ranking in practice is more favourable than variable selection, as ranking allows to easily apply a thresholding rule in order to identify a subset of informative variables. The authors implemented a pseudo-boosting approach, which is technically not based on statistical boosting but is adapted to rank and select variables for statistical models. Note that also stability selection can be seen as a variable ranking scheme based on their selection frequency, as its selection feature is only triggered by implementing the threshold .

Following a gradient based approach, Huang et al. [55] adapted the sparse boosting approach by Bühlmann and Yu [56] in order to promote similarity of model sparsity structures in the integrative analysis of multiple data sets, which surely is an important topic regarding the trend toward big data.

4 Functional regression

Due to technological developments, more and more data is measured continuously over time. Over the last years, a lot of methodological research focused on regression methods for this type of functional data. A groundbreaking work in this new and evolving field of statistics is provided by Ramsay and Silverman [57].

Functional regression models can either contain functional responses (defined on a continuous domain), functional covariates or both. This leads basically to three different classes of functional regression models, i.e., function-on-scalar (response is functional), scalar-on-function (functional explanatory variable) and function-on-function regression. For a recent review on general methodological developments on functional regression, see Morris [58].

4.1 Boosting functional data

The first statistical boosting algorithm for functional regression, allowing for data-driven variable selection, was proposed by Brockhaus et al. [59]. The authors’ approach focused on linear array models [60] providing a unified framework for all three settings outlined above. Since the general structure of their gradient boosting algorithm is similar to the one in Box 1, the resulting models still have the same form as in (1), only that the response and the covariates may be functions. The underlying functional partial effects

can be represented using tensor product basis

where is the vector of coefficients, and are basis functions, and denotes the Kronecker product.

This functional array model is limited in two ways: (i) the functional responses need to be measured on a common grid and (ii) covariates need to be constant over the domain of the response. As particularly the second assumption might often not be fulfilled in practice, Brockhaus et al. [14] soon after proposed a general framework for boosting functional regression models avoiding this assumption and dropping the linear array structure.

This newer framework [14] comprises also all three model classes outlined above and particularly focuses on historical effects, where functional response and functional covariates are observed over the same time interval. The underlying assumption is that observations of the covariate affect the response only up to the corresponding time point

(5)

where represents the time points the covariate was observed for. In other words, only the part of the covariate function lying in the past (not the future) can affect the present response. However, this is a sensible restriction in most practical applications and thus not a strong restriction.

Both approaches for boosting functional regression are implemented in the R add-on package FDboost [61], which relies on the fitting methods and infrastructure of mboost.

4.2 Extensions of boosting functional regression

Boosting functional data can be combined with stability selection (see Section 3.1) in order to enhance the variable selection properties of the algorithm [59, 14].

The boosting approach for functional data was already extended towards the model class of generalized additive models for location, scale and shape (GAMLSS) for a scalar-on-function setting by Brockhaus et al. [62]. The functional approach was named signal regression models for location, scale and shape [62]. The estimation via gradient boosting is based on the corresponding gamboostLSS algorithm for boosting GAMLSS [49, 50].

In an approach to analyse the functional relationship between bioelectrical signals like electroencephalography (EEG) and facial electromyography (EMG), Rügamer et al. [63] focused on extending the framework of boosting functional regression by incorporating factor specific historical effects, similar to (5).

Although functional data analysis triggered a lot of methodological research, a recent systematic review by Ullah and Finch [64] revealed that the number of actual biomedical applications of functional data analysis in general and functional regression in particular is rather small. The authors argued that the potential benefits of these flexible models (like richer interpretation and more flexible structures) are not yet well understood by practitioners and that further efforts are necessary to promote the actual usage of these novel techniques.

5 Boosting advanced survival models

While Cox regression is still the dominant model class for boosting time-to-event data (see [33] for a comparison of two different boosting algorithms, and [65] for different general approaches to estimate Cox models in the presence of high-dimensional data), over the last years several alternatives emerged [45, 46, 66].

In this section we will particularly focus on boosting joint models of time-to-event outcomes and longitudinal markers but will also briefly refer to other recent extensions.

5.1 Boosting joint models

The concept of joint modelling of longitudinal and time-to-event data has found its way into the statistical literature in the last few years as it gives a very complete answer to questions on continous data recorded over time and event times related to this continous data. Modelling those to processes independently as done up to the suggestion of the joint modelling idea [67] leads to misspecified models prone to bias. There are various joint modelling approaches and thus also various different model equations. The type we are going to refer to in this review are of the following type:

(6)

where is the -th observation of the -th individual with and and is the hazard function for individual at time point . Both outcomes, the longitudinal measurement as well as the event time are modeled based on two sub-predictors each: one that is supposed to have an impact on only one of them (the longitudinal sub-predictor and the survival sub-predictor ) and one of them being shared by both parts of the model (the shared sub-predictor ). All those sub-predictors are functions of different, possibly time dependent variables . In many cases the shared sub-predictor consists of or at least includes some type of random effects. The function

is the baseline hazard. Most approaches for joint models are based on likelihood or Bayesian inference using the joint likelihood resulting as a product from the above likelihoods

[68, 69]. Those approaches are, however, unable to conduct variable selection and cannot deal with high-dimensional data.

Waldmann et al. [13] suggested a boosting algorithm tackling these challenges. The model used in this paper was a reduced version of (6) in which no survival sub-predictor was considered and a fixed baseline hazard was used. The algorithm is a version of the classical boosting algorithm as represented in Box 1, which is adapted to the special case of having to estimate a set of different sub-predictors (similar to [49]). The algorithm is therefore composed of three steps which are performed circularly. In the first step a regular boosting step to update the longitudinal sub-predictor is performed and the parameters of the shared sub-predictor are treated as fixed. In the second step, the parameters of the longitudinal sub-predictor are fixed and a boosting step for the shared sub-predictor is conducted. The third step is a simple optimization step: based on the current values of the parameters in both sub-predictors the likelihoods are optimized with respect to , and (cf., [70]). The number of iterations now depends on two stopping iterations and which have to be optimized on a two dimensional grid via cross validation. Waldmann et al. [13] showed that the benefits of boosting algorithm (automated variable selection and handling of situations) can be transfered to joint modelling and hence lay the groundwork to further extended joint modelling approaches.

The code for the approach presented here is available in the R add-on package JMboost [71], currently on GitHub.

5.2 Other new approaches on boosting survival data

Reulen and Kneib [72] extended the framework of statistical boosting towards multi-state models for patients exposed to competing risks (e.g., adverse events, recovery, death or relapse). The approach is implemented in the gamboostMSM package [73], relying on the infrastructure of mboost. Möst and Hothorn [74] focused on boosting patient-specific survivor function based on conditional transformation models [75]

incorporating inverse probability of censoring weights

[76].

When statistical boosting algorithms are used to estimate survival models, the motivation is most often the presence of high-dimensional data. De Bin et al. [77] investigated several approaches (including gradient boosting and likelihood-based boosting) to incorporate both clinical and high-dimensional omics data to build prediction models.

Guo et al. [78] proposed a new adaptive likelihood-based boosting algorithm to fit Cox models, incorporating a direct lasso-type penalization in the fitting process in order to avoid the inclusion of variables with small effect. This general motivation is similar to the one of the boosting algorithm with step-length modification factor proposed by Sariyar et al. [53]. In another approach, Sariyar et al. [79]

combined a likelihood-based boosting approach for the Cox model with random forest in order to screen for interaction effects in high-dimensional data. Hieke

et al. [80] combined likelihood-based boosting with resampling in an approach to identify prognostic SNPs in potentially small clinical cohorts.

6 New frontiers and applications

There were even more new topics that have been incorporated into the framework of statistical boosting, but not all of them can be presented in detail here. However, we want to give a short overview of the most relevant developments, notably many of those were actually motivated by biomedical applications.

Weinhold et al. [81] proposed to analyse DNA methylation data (signal intensities and

), via a “ratio of correlated gammas” model. Based on a bivariate gamma distribution for

and values, the authors derived the density for the ratio and optimized it via gradient boosting.

A boosting algortihm for differential item functioning in Rasch models was developed by Schauberger and Tutz [82] for the broader area of psychometrics, while Casalicchio et al. focused on a boosting subject-specific Bradley-Terry-Luce models [83].

Napolitano et al. [84] developed a sampled boosting algorithm for the analysis of brain perfusion images: Gradient boosting is carried out multiple times on different training sets. Each base-learner refers to a voxel and after every sampling iteration a fixed fraction of selected voxels is randomly left out from the following boosting fit, in order to force the algorithm to select new voxels. The final model is then computed as the global sum of all solutions. Feilke et al. [85] proposed a voxelwise boosting approach for the analysis of contrast-enhanced magnetic resonance imaging data (DCE-MRI), which was additionally enhanced to account for the regional structure of the voxels via a spatial penalty.

Pybus et al. [86] proposed a hierarchical boosting algorithm for classification in an approach to detect positive selection in genomic regions (cf., [87]). Truntzer et al. [88] compared the classification performance of gradient boosting with other methods combining clinical variables and high-dimensional mass spectrometry data and concluded that the variable selection properties of boosting led also to a very good performance regarding prediction accuracy.

Regarding boosting location and scale models (modelling both expected value and variance in the spirit of GAMLSS

[48]), Messner et al. [89] proposed a boosting algorithm for predictor selection in ensemble postprocessing to better calibrate ensemble weather forecasts. The idea of ensemble forecasting is to account for model errors and to quantify forecast uncertainty. Mayr et al. [90]

used boosted location and scale models in combination with permutation tests to assess simultaneously systematic bias and random measurement errors of medical devices. The use of a permutation test tackles one of the remaining problems of statistical boosting approaches in practical biomedical research: The lack of standard errors for effect estimates makes it necessary to incorporate resampling procedures to construct confidence intervals or to assess significance of effects.

The methodological development in [90], analogously to many of the extensions presented in this article, was motivated by the applied analysis of biomedical data. Statistical boosting algorithms, however, have been applied over the last few years in various biomedical applications without the need for methodological extensions that could be described here. Most application focus on prediction modelling or variable selection. We want to briefly mention a selection of the most recent ones from the last two years: The different research questions comprise the development of birth weight prediction formulas [91] for particularly small babies, the prediction of smoking cessation and its relapse in HIV-infected patients [92], Escherichia coli Fed-Batch Fermentation Modeling [93], the prediction of cardiovascular death on older patients in the emergency department [94] and the identification of factors influencing therapeutic decisions regarding rheumatoid arthritis [95].

7 Discussion

After Friedman et al. [9] discovered the link between boosting and additive modelling in their seminal paper, most research on boosting methods has been focused on the development of methodology within the univariate GAM framework. This line of research included, among many other achievements, the estimation of smooth predictor effects via spline base-learners [30] and the extension of boosting to other GAM families than binary classification and Gaussian regression [2]. We have summarized these methods and described their relationships in an earlier review [1].

In this article, we have highlighted several new research areas in the field of statistical boosting leaving the traditional GAM modeling approach. A particularly active research area during the last few years addresses the development of boosting algorithms for new model classes extending the GAM framework. These include, among others, the simultaneous modelling of location, scale and shape parameters within the GAMLSS framework [49], the modelling of functional data [59], and, recently, the class of joint models for longitudinal and survival data [13]. It goes without saying that these developments will make boosting algorithms available for practical use in much more sophisticated clinical and epidemiological applications than before.

Another line of research, which we described in detail in Sections 2 and 3, aims at exploring the connections between statistical boosting methods and machine learning techniques that were originally developed independently of boosting. An important example is stability selection, a generic methodology that, at the time of its development, mainly focussed on penalized regression models such as the lasso. Only in recent years, stability selection has been adapted to become a tool for variable selection within the boosting framework (e.g. [47]). Other work in this context is the analysis of the connections between boosting and penalized regression [10] and the work by Sariyar et al. [79] exploring a combination of boosting and random forest methods.

Finally, as already noted by Hothorn [25], boosting may not solely be regarded as a framework for regularized model fitting but also as a generic optimization tool on its own right. In particular, boosting constitutes a robust algorithm for the optimization of objective functions that, due to their structure or complexity, may pose problems for Newson-Raphson-type and related methods. This was, for example, the motivation for the use of boosting in the articles by Hothorn et al. [75] and Weinhold et al. [81].

Regarding future research, a huge challenge for the use of boosting algorithms in biomedical applications arises from the era of big data. Unlike other machine learning methods like random forests, the sequential nature of boosting methods hampers the use of parallelization techniques within the algorithm, which may result in issues with the fitting and tuning of complex models with multidimensional predictors and/or sophisticated base-learners like splines or higher-sized trees. To overcome these problems in classification and univariate regression, Chen and Guestrin [96] developed the extremely fast and sophisticated xgboost environment. However, for the more recent extensions discussed in this paper, big data solutions for statistical boosting still have to be developed.

Conflict of interests

The authors declare that there is no conflict of interest regarding the publication of this paper.

Acknowledgements

The authors thank Corinna Buchstaller for her help with the literature search. The work on this article was supported by the Deutsche Forschungsgemeinschaft (DFG) (www.dfg.de), grant no. SCHM 2966/1-2 (grant to MS and OG) and the Interdisciplinary Center for Clinical Research (IZKF) of the Friedrich-Alexander-University Erlangen-Nürnberg via the Projects J49 (grant to AM) and J61 (grant to EW).

References

  • [1] Mayr A, Binder H, Gefeller O, Schmid M. The Evolution of Boosting Algorithms. Methods of Information in Medicine. 2014;53(6):419–427.
  • [2] Bühlmann P, Hothorn T. Boosting Algorithms: Regularization, Prediction and Model Fitting (with Discussion). Statistical Science. 2007;22:477–522.
  • [3] Tutz G, Binder H. Generalized Additive Modeling with Implicit Variable Selection by Likelihood-based Boosting. Biometrics. 2006;62:961–971.
  • [4] Hofner B, Hothorn T, Kneib T, Schmid M. A Framework for Unbiased Model Selection Based on Boosting. Journal of Computational and Graphical Statistics. 2011;20:956–971.
  • [5] Kneib T, Hothorn T, Tutz G. Variable Selection and Model Choice in Geoadditive Regression Models. Biometrics. 2009;65(2):626–634.
  • [6] Breiman L. Statistical modeling: The Two Cultures (with Comments and a Rejoinder by the Author). Statistical Science. 2001;16(3):199–231.
  • [7] Freund Y. Boosting a Weak Learning Algorithm by Majority.

    In: Fulk MA, Case J, editors. Proceedings of the Third Annual Workshop on Computational Learning Theory, COLT 1990, University of Rochester, Rochester, NY, USA, August 6-8, 1990; 1990. p. 202–216.

  • [8] Friedman JH. Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics. 2001;29:1189–1232.
  • [9] Friedman JH, Hastie T, Tibshirani R. Additive Logistic Regression: A Statistical View of Boosting (with Discussion). The Annals of Statistics. 2000;28:337–407.
  • [10] Hepp T, Schmid M, Gefeller O, Waldmann E, Mayr A. Approaches to Regularized Regression–A Comparison between Gradient Boosting and the Lasso. Methods of Information in Medicine. 2016;55(5):422–430.
  • [11] Mayr A, Binder H, Gefeller O, Schmid M. Extending statistical boosting. Methods of Information in Medicine. 2014;53(6):428–435.
  • [12] Hofner B, Boccuto L, Göker M. Controlling false discoveries in high-dimensional situations: Boosting with stability selection. BMC Bioinformatics. 2015;16:144.
  • [13] Waldmann E, Taylor-Robinson D, Klein N, Kneib T, Pressler T, Schmid M, et al. Boosting Joint Models for Longitudinal and Time-to-Event Data. arXiv preprint arXiv:160902686. 2016;.
  • [14] Brockhaus S, Melcher M, Leisch F, Greven S. Boosting flexible functional regression models with a high number of functional historical effects. Statistics and Computing. 2016;p. 1–14.
  • [15] Schapire RE. The Strength of Weak Learnability. Machine Learning. 1990;5(2):197–227.
  • [16] Schapire RE, Freund Y. Boosting: Foundations and Algorithms. MIT Press; 2012.
  • [17] Freund Y, Schapire R. Experiments With a New Boosting Algorithm. In: Proceedings of the Thirteenth International Conference on Machine Learning Theory. San Francisco, CA: San Francisco: Morgan Kaufmann Publishers Inc.; 1996. p. 148–156.
  • [18] Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. New York: Springer; 2009.
  • [19] Wyner AJ, Olson M, Bleich J, Mease D.

    Explaining the success of adaboost and random forests as interpolating classifiers.

    arXiv preprint arXiv:150407676. 2015;.
  • [20] Strobl C, Boulesteix AL, Zeileis A, Hothorn T. Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinformatics. 2007;8(1):25.
  • [21] Hapfelmeier A, Hothorn T, Ulm K, Strobl C. A new variable importance measure for random forests with missing data. Statistics and Computing. 2014;24(1):21–34.
  • [22] Hastie T, Tibshirani R. Generalized Additive Models. London: Chapman & Hall; 1990.
  • [23] Mason L, Baxter J, Bartlett PL, Frean MR. Boosting Algorithms as Gradient Descent. In: NIPS; 1999. p. 512–518.
  • [24] Mayr A, Hofner B, Schmid M. The Importance of Knowing When to Stop – A Sequential Stopping Rule for Component-Wise Gradient Boosting. Methods of Information in Medicine. 2012;51(2):178–186.
  • [25] Hothorn T. Boosting–an unusual yet attractive optimiser. Methods of Information in Medicine. 2014;53(6):417–418.
  • [26] Hothorn T, Bühlmann P, Kneib T, Schmid M, Hofner B. mboost: Model-Based Boosting; 2016. R package version 2.7-0. Available from: https://CRAN.R-project.org/package=mboost.
  • [27] R Development Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria; 2016. ISBN 3-900051-07-0. Available from: https://www.R-project.org.
  • [28] Hofner B, Mayr A, Robinzonov N, Schmid M. Model-Based Boosting in R: A Hands-on Tutorial Using the R Package mboost. Computational Statistics. 2014;29:3–35.
  • [29] Tutz G, Binder H.

    Boosting Ridge Regression.

    Computational Statistics & Data Analysis. 2007;51(12):6044–6059.
  • [30] Bühlmann P, Yu B. Boosting with the Loss: Regression and Classification. Journal of the American Statistical Association. 2003;98:324–338.
  • [31] Binder H. GAMBoost: Generalized Linear and Additive Models by Likelihood Based Boosting.; 2011. R package version 1.2-2. Available from: https://CRAN.R-project.org/package=GAMBoost.
  • [32] Binder H. CoxBoost: Cox Models by Likelihood-based Boosting for a Single Survival Endpoint or Competing Risks; 2013. R package version 1.4. Available from: https://CRAN.R-project.org/package=CoxBoost.
  • [33] De Bin R. Boosting in Cox regression: a comparison between the likelihood-based and the model-based approaches with focus on the R-packages CoxBoost and mboost. Computational Statistics. 2016;31(2):513–531.
  • [34] Tibshirani R. Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society - Series B. 1996;58(1):267–288.
  • [35] Efron B, Hastie T, Johnstone L, Tibshirani R. Least Angle Regression. Annals of Statistics. 2004;32:407–499.
  • [36] Meinshausen N, Rocha G, Yu B. Discussion: a tale of three cousins: Lasso, l2boosting and Dantzig. The Annals of Statistics. 2007 12;35(6):2373–2384.
  • [37] Duan J, Soussen C, Brie D, Idier J, Wang YP. On lars/homotopy equivalence conditions for over-determined lasso. Signal Processing Letters, IEEE. 2012 Dec;19(12):894–897.
  • [38] Bühlmann P, Gertheiss J, Hieke S, Kneib T, Ma S, Schumacher M, et al. Discussion of ’The evolution of boosting algorithms’ and ’Extending statistical boosting’. Methods of Information in Medicine. 2014;53(6):436–445.
  • [39] Hastie T, Taylor J, Tibshirani R, Walther G. Forward stagewise regression and the monotone lasso. Electronic Journal of Statistics. 2007;1:1–29.
  • [40] Janitza S, Binder H, Boulesteix AL. Pitfalls of hypothesis tests and model selection on bootstrap samples: causes and consequences in biometrical applications. Biometrical Journal. 2015;58(3):447–473.
  • [41] Meinshausen N, Bühlmann P. Stability Selection (with Discussion). Journal of the Royal Statistical Society Series B. 2010;72:417–473.
  • [42] Shah RD, Samworth RJ. Variable Selection with Error Control: Another Look at Stability Selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2013;75(1):55–80.
  • [43] Hofner B, Hothorn T. stabs: Stability Selection with Error Control; 2017. R package version 0.6-2. Available from: https://CRAN.R-project.org/package=stabs.
  • [44] Mayr A, Hofner B, Schmid M. Boosting the discriminatory power of sparse survival models via optimization of the concordance index and stability selection. BMC Bioinformatics. 2016;17(1):288.
  • [45] Mayr A, Schmid M. Boosting the Concordance Index for Survival Data – A Unified Framework to Derive and Evaluate Biomarker Combinations. PloS ONE. 2014;9(1):e84483.
  • [46] Chen Y, Jia Z, Mercola D, Xie X. A Gradient Boosting Algorithm for Survival Analysis via Direct Optimization of Concordance Index. Computational and Mathematical Methods in Medicine. 2013;p. 1–8. ID 873595.
  • [47] Thomas J, Mayr A, Bischl B, Schmid M, Smith A, Hofner B. Stability selection for component-wise gradient boosting in multiple dimensions. arXiv preprint arXiv:161110171. 2016;.
  • [48] Rigby RA, Stasinopoulos DM. Generalized Additive Models for Location, Scale and Shape (with discussion). Applied Statistics. 2005;54:507–554.
  • [49] Mayr A, Fenske N, Hofner B, Kneib T, Schmid M. Generalized Additive Models for Location, Scale and Shape for High-Dimensional Data – A Flexible Aproach Based on Boosting. Journal of the Royal Statistical Society: Series C (Applied Statistics). 2012;61(3):403–427.
  • [50] Hofner B, Mayr A, Schmid M. gamboostLSS: An R Package for Model Building and Variable Selection in the GAMLSS Framework. Journal of Statistical Software. 2016;74(1):1–31.
  • [51] Hofner B, Mayr A, Fenske N, Schmid M. gamboostLSS: Boosting Methods for GAMLSS Models; 2016. R package version 1.2-2. Available from: https://CRAN.R-project.org/package=gamboostLSS.
  • [52] Hofner B, Mayr A, Fenske N, Thomas J, Schmid M. gamboostLSS: Boosting Methods for GAMLSS Models; 2017. R package version 2.0-0. Available from: https://github.com/boost-R/gamboostLSS/tree/devel.
  • [53] Sariyar M, Schumacher M, Binder H. A boosting approach for adapting the sparsity of risk prediction signatures based on different molecular levels. Statistical Applications in Genetics and Molecular Biology. 2014;13(3):343–357.
  • [54] Zhang CX, Zhang JS, Kim SW.

    PBoostGA: pseudo-boosting genetic algorithm for variable ranking and selection.

    Computational Statistics. 2016;31(4):1237–1262.
  • [55] Huang Y, Liu J, Yi H, Shia BC, Ma S. Promoting similarity of model sparsity structures in integrative analysis of cancer genetic data. Statistics in Medicine. 2017;36(3):509–559.
  • [56] Bühlmann P, Yu B. Sparse boosting. Journal of Machine Learning Research. 2006;7(Jun):1001–1024.
  • [57] James GM, Silverman BW. Functional adaptive model estimation. Journal of the American Statistical Association. 2005;100(470):565–576.
  • [58] Morris JS. Functional regression. Annual Review of Statistics and Its Application. 2015;2:321–359.
  • [59] Brockhaus S, Scheipl F, Hothorn T, Greven S. The functional linear array model. Statistical Modelling. 2015;15(3):279–300.
  • [60] Currie I, Durban M, Eilers P. Generalized linear array models with applications to multidimensional smoothing. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2006;68(2):259–280.
  • [61] Brockhaus S, Rügamer D. FDboost: Boosting Functional Regression Models; 2016. R package version 0.2-0. Available from: https://CRAN.R-project.org/package=FDboost.
  • [62] Brockhaus S, Fuest A, Mayr A, Greven S. Signal regression models for location, scale and shape with an application to stock returns. arXiv preprint arXiv:160504281. 2016;.
  • [63] Rügamer D, Brockhaus S, Gentsch K, Scherer K, Greven S. Boosting Factor-Specific Functional Historical Models for the Detection of Synchronisation in Bioelectrical Signals. arXiv preprint arXiv:160906070. 2016;.
  • [64] Ullah S, Finch CF. Applications of functional data analysis: A systematic review. BMC Medical Research Methodology. 2013;13(1):43.
  • [65] Zemmour C, Bertucci F, Finetti P, Chetrit B, Birnbaum D, Filleron T, et al. Prediction of early breast cancer metastasis from DNA microarray data using high-dimensional cox regression models. Cancer Informatics. 2015;14(Suppl 2):129.
  • [66] Schmid M, Hothorn T. Flexible Boosting of Accelerated Failure Time Models. BMC Bioinformatics. 2008;9(269).
  • [67] Wulfsohn MS, Tsiatis AA. A Joint Model for Survival and Longitudinal Data Measured with Error. Biometrics. 1997;53:330–339.
  • [68] Faucett CL, Thomas DC. Simultaneously Modelling Censored Survival Data and Repeatedly Measured Covariates: a Gibbs Sampling Approach. Statistics in Medicine. 1996;15:1663–1685.
  • [69] Rizopoulos D. JM: An R Package for the Joint Modelling of Longitudinal and Time-to-Event Data. Journal of Statistical Software. 2010;35(9):1–33.
  • [70] Schmid M, Potapov S, Pfahlberg A, Hothorn T. Estimation and Regularization Techniques for Regression Models with Multidimensional Prediction Functions. Statistics and Computing. 2010;20:139–150.
  • [71] Waldmann E, Mayr A. JMboost: Boosting Joint Models for Longitudinal and Time-to-Event Outcomes; 2016. R package version 0.1-0. Available from: https://github.com/mayrandy/JMboost.
  • [72] Reulen H, Kneib T. Boosting multi-state models. Lifetime data analysis. 2016;22(2):241–262.
  • [73] Reulen H. gamboostMSM: Estimating multistate models using gamboost(); 2014. R package version 1.1.87. Available from: https://CRAN.R-project.org/package=gamboostMSM.
  • [74] Möst L, Hothorn T. Conditional transformation models for survivor function estimation. The International Journal of Biostatistics. 2015;11(1):23–50.
  • [75] Hothorn T, Kneib T, Bühlmann P. Conditional Transformation Models. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2014;76(1):3–27.
  • [76] Van der Laan MJ, Robins JM. Unified methods for censored longitudinal data and causality. Springer Science & Business Media; 2003.
  • [77] De Bin R, Sauerbrei W, Boulesteix AL. Investigating the prediction ability of survival models based on both clinical and omics data: two case studies. Statistics in Medicine. 2014;33(30):5310–5329.
  • [78] Guo Z, Lu W, Li L. Forward Stagewise Shrinkage and Addition for High Dimensional Censored Regression. Statistics in Biosciences. 2015;7(2):225–244.
  • [79] Sariyar M, Hoffmann I, Binder H. Combining techniques for screening and evaluating interaction terms on high-dimensional time-to-event data. BMC Bioinformatics. 2014;15(1):58.
  • [80] Hieke S, Benner A, Schlenk RF, Schumacher M, Bullinger L, Binder H. Identifying prognostic SNPs in clinical cohorts: Complementing univariate analyses by resampling and multivariable modeling. PloS One. 2016;11(5):e0155226.
  • [81] Weinhold L, Wahl S, Pechlivanis S, Hoffmann P, Schmid M. A statistical model for the analysis of beta values in DNA methylation studies. BMC Bioinformatics. 2016;17(1):480.
  • [82] Schauberger G, Tutz G. Detection of differential item functioning in Rasch models by boosting techniques. British Journal of Mathematical and Statistical Psychology. 2016;69(1):80–103.
  • [83] Casalicchio G, Tutz G, Schauberger G. Subject-specific Bradley–Terry–Luce models with implicit variable selection. Statistical Modelling. 2015;15(6):526–547.
  • [84] Napolitano G, Stingl JC, Schmid M, Viviani R. Predicting CYP2D6 phenotype from resting brain perfusion images by gradient boosting. Psychiatry Research: Neuroimaging. 2017;259:16–24.
  • [85] Feilke M, Bischl B, Schmid VJ, Gertheiss J. Boosting in Nonlinear Regression Models with an Application to DCE-MRI Data. Methods of Information in Medicine. 2016;55(1):31–41.
  • [86] Pybus M, Luisi P, Dall’Olio GM, Uzkudun M, Laayouni H, Bertranpetit J, et al. Hierarchical boosting: a machine-learning framework to detect and classify hard selective sweeps in human populations. Bioinformatics. 2015;31(24):3946–3952.
  • [87] Lin K, Li H, Schlötterer C, Futschik A. Distinguishing positive selection from neutral evolution: boosting the performance of summary statistics. Genetics. 2011;187(1):229–244.
  • [88] Truntzer C, Mostacci E, Jeannin A, Petit JM, Ducoroy P, Cardot H. Comparison of classification methods that combine clinical data and high-dimensional mass spectrometry data. BMC Bioinformatics. 2014;15(1):385.
  • [89] Messner JW, Mayr GJ, Zeileis A. Nonhomogeneous Boosting for Predictor Selection in Ensemble Postprocessing. Monthly Weather Review. 2017;145(1):137–147.
  • [90] Mayr A, Schmid M, Pfahlberg A, Uter W, Gefeller O. A permutation test to analyse systematic bias and random measurement errors of medical devices via boosting location and scale models. Statistical Methods in Medical Research. 2015;doi:10.1177/0962280215581855.
  • [91] Faschingbauer F, Dammer U, Raabe E, Kehl S, Schmid M, Schild RL, et al. A New Sonographic Weight Estimation Formula for Small-for-Gestational-Age Fetuses. Journal of Ultrasound in Medicine. 2016;35(8):1713–1724.
  • [92] Schäfer J, Young J, Bernasconi E, Ledergerber B, Nicca D, Calmy A, et al. Predicting smoking cessation and its relapse in HIV-infected patients: the Swiss HIV Cohort Study. HIV Medicine. 2015;16(1):3–14.
  • [93] Melcher M, Scharl T, Luchner M, Striedner G, Leisch F. Boosted structured additive regression for Escherichia coli fed-batch fermentation modeling. Biotechnology and Bioengineering. 2017;114(2):321–334.
  • [94] Bahrmann P, Christ M, Hofner B, Bahrmann A, Achenbach S, Sieber CC, et al. Prognostic value of different biomarkers for cardiovascular death in unselected older patients in the emergency department. European Heart Journal: Acute Cardiovascular Care. 2015;doi:10.1177/2048872615612455.
  • [95] Pattloch D, Richter A, Manger B, Dockhorn R, Meier L, Tony HP, et al. Das erste Biologikum bei rheumatoider Arthritis: Einflussfaktoren auf die Therapieentscheidung. Zeitschrift für Rheumatologie. 2016;p. 1–7. doi:10.1007/s00393-016-0174-3.
  • [96] Chen T, Guestrin C. Xgboost: A scalable tree boosting system. In: Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM; 2016. p. 785–794.
  • [97] Hofner B, Müller J, Hothorn T. Monotonicity-Constrained Species Distribution Models. Ecology. 2011;92:1895–1901.
  • [98] Hofner B, Kneib T, Hothorn T. A Unified Framework of Constrained Regression. Statistics and Computing. 2014;26:1–14.
  • [99] Hofner B, Smith A. Boosted negative binomial hurdle models for spatiotemporal abundance of sea birds. Proceedings of the 30th International Workshop on Statistical Modelling. 2015;p. 221–226.

Appendix A Developments regarding the mboost package

This appendix describes important changes during the last years that were implemented in the R package mboost after the tutorial paper [28] on its use was published.

Starting from mboost 2.2, the default for the degrees of freedom was changed, they are now defined as

with smoother matrix . Analyses have shown, that this leads to a reduced selection bias, see [4]. Earlier versions used the trace of the smoother matrix as degrees of freedom, i.e., . One can change to the old definition by setting options(mboost_dftraceS = TRUE). For parallel computations of cross-validated stopping values, mboost now uses the package parallel, which is included in the standard R installation. The behavior of bols(x, intercept = FALSE) was changed when x is a factor: the intercept is simply dropped from the design matrix and the coding can be specified as usual for factors. Additionally, a new contrast was introduced: "contr.dummy" (see the manual of bols for details). Finally, the computation of B-spline basis at the boundaries was changed such that equidistant boundary knots are used per default.

With mboost 2.3, constrained effects [97, 98] are fitted per default using quadratic programming methods (option type = "quad.prog") improving the speed of computation drastically. Additional to monotonic, convex and concave effects, new constraints were introduced to fit "positive" or "negative" effects or effects with boundary constraints (see bmono for details). Additionally, a new function to assign values to a model object was added (mstop(mod) <- i) as well as two new distribution families Hurdle [99] and Multinomial [70]. Finally, a new option was implemented to allow for stopping based on out-of-bag data during fitting (via boost_control(..., stopintern = TRUE)).

With mboost 2.4, bootstrap confidence intervals were implemented in the novel confint function [98]. The stability selection procedure was moved to a dedicated package stabs [43], while a specific function for gradient boosting was implemented in package mboost.

From mboost 2.5 onward, cross-validation does not stop on errors in single folds anymore and was sped up by setting mc.preschedule = FALSE if parallel computations via mclapply are used. A documentation for the function plot.mboost was added, which allows to visualize model results. Values outside the boundary knots are now forbidden during fitting, while linear extrapolation is used for prediction.

With mboost 2.6 a lot of bug fixes and small improvements were provided. Most notably, the development of the package is now hosted entirely on github in the collaborative project boost-R/mboost and the package maintainer changed.

The current CRAN version mboost 2.7 provides a new family Cindex [45], variable importance measures (varimp) and improved plotting facilities.

Changes in the current development version which will be deployed to CRAN with the next release of mboost include major changes to distribution families, allowing to specify link functions. The Binomial family will additionally provide an alternative implementation of Binomial regression models along the lines of the classic glm implementation, which can be used via Binomial(type = "glm"). This family also works with a two-column matrix containing the number of successes and number of failures. Furthermore, models with zero steps (i.e., models containing only the offset) will be supported as well as cross-validated models without base-learners.