PaloBoost: An Overfitting-robust TreeBoost with Out-of-Bag Sample Regularization Techniques

07/22/2018
by   Yubin Park, et al.
0

Stochastic Gradient TreeBoost is often found in many winning solutions in public data science challenges. Unfortunately, the best performance requires extensive parameter tuning and can be prone to overfitting. We propose PaloBoost, a Stochastic Gradient TreeBoost model that uses novel regularization techniques to guard against overfitting and is robust to parameter settings. PaloBoost uses the under-utilized out-of-bag samples to perform gradient-aware pruning and estimate adaptive learning rates. Unlike other Stochastic Gradient TreeBoost models that use the out-of-bag samples to estimate test errors, PaloBoost treats the samples as a second batch of training samples to prune the trees and adjust the learning rates. As a result, PaloBoost can dynamically adjust tree depths and learning rates to achieve faster learning at the start and slower learning as the algorithm converges. We illustrate how these regularization techniques can be efficiently implemented and propose a new formula for calculating feature importance to reflect the node coverages and learning rates. Extensive experimental results on seven datasets demonstrate that PaloBoost is robust to overfitting, is less sensitivity to the parameters, and can also effectively identify meaningful features.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/06/2012

No More Pesky Learning Rates

The performance of stochastic gradient descent (SGD) depends critically ...
research
02/08/2016

A Variational Analysis of Stochastic Gradient Algorithms

Stochastic Gradient Descent (SGD) is an important algorithm in machine l...
research
11/13/2021

Bolstering Stochastic Gradient Descent with Model Building

Stochastic gradient descent method and its variants constitute the core ...
research
06/04/2020

Robust Sampling in Deep Learning

Deep learning requires regularization mechanisms to reduce overfitting a...
research
04/30/2014

Learning with incremental iterative regularization

Within a statistical learning setting, we propose and study an iterative...
research
07/26/2019

Scalable Semi-Supervised SVM via Triply Stochastic Gradients

Semi-supervised learning (SSL) plays an increasingly important role in t...
research
07/31/2018

Using Feature Grouping as a Stochastic Regularizer for High-Dimensional Noisy Data

The use of complex models --with many parameters-- is challenging with h...

Please sign up or login with your details

Forgot password? Click here to reset