Uncertainty in Gradient Boosting via Ensembles

06/18/2020
by   Aleksei Ustimenko, et al.
0

Gradient boosting is a powerful machine learning technique that is particularly successful for tasks containing heterogeneous features and noisy data. While gradient boosting classification models return a distribution over class labels, regressions models typically yield only point predictions. However, for many practical, high-risk applications, it is also important to be able to quantify uncertainty in the predictions to avoid costly mistakes. In this work, we examine a probabilistic ensemble-based framework for deriving uncertainty estimates in the predictions of gradient boosting classification and regression models. Crucially, the proposed approach allows the total uncertainty to be decomposed into data uncertainty, which comes from the complexity and noise in data distribution, and knowledge uncertainty, coming from the lack of information about a given region of the feature space. Two approaches for generating ensembles are considered: Stochastic Gradient Boosting (SGB) and Stochastic Gradient Langevin Boosting (SGLB). Notably, SGLB also enables the generation of a virtual ensemble via only one gradient boosting model, which significantly reduces complexity. Experiments on a range of regression and classification datasets show that ensembles of gradient boosting models yield improved predictive performance, and measures of uncertainty successfully enable detection of out-of-domain inputs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/08/2019

NGBoost: Natural Gradient Boosting for Probabilistic Prediction

We present Natural Gradient Boosting (NGBoost), an algorithm which bring...
research
10/12/2020

A Generalized Stacking for Implementing Ensembles of Gradient Boosting Machines

The gradient boosting machine is one of the powerful tools for solving r...
research
06/03/2021

Probabilistic Gradient Boosting Machines for Large-Scale Probabilistic Regression

Gradient Boosting Machines (GBM) are hugely popular for solving tabular ...
research
06/07/2021

Multivariate Probabilistic Regression with Natural Gradient Boosting

Many single-target regression problems require estimates of uncertainty ...
research
06/26/2018

Boulevard: Regularized Stochastic Gradient Boosted Trees and Their Limiting Distribution

This paper examines a novel gradient boosting framework for regression. ...
research
10/16/2017

Calibrated Boosting-Forest

Excellent ranking power along with well calibrated probability estimates...
research
05/09/2022

Wavelet-Based Hybrid Machine Learning Model for Out-of-distribution Internet Traffic Prediction

Efficient prediction of internet traffic is essential for ensuring proac...

Please sign up or login with your details

Forgot password? Click here to reset