Asynchronous Parallel Sampling Gradient Boosting Decision Tree

04/12/2018
by   Cheng Daning, et al.
0

With the development of big data technology, Gradient Boosting Decision Tree, i.e. GBDT, becomes one of the most important machine learning algorithms for its accurate output. However, the training process of GBDT needs a lot of computational resources and time. In order to accelerate the training process of GBDT, the asynchronous parallel sampling gradient boosting decision tree, abbr. asynch-SGBDT is proposed in this paper. Via introducing sampling, we adapt the numerical optimization process of traditional GBDT training process into stochastic optimization process and use asynchronous parallel stochastic gradient descent to accelerate the GBDT training process. Meanwhile, the theoretical analysis of asynch-SGBDT is provided by us in this paper. Experimental results show that GBDT training process could be accelerated by asynch-SGBDT. Our asynchronous parallel strategy achieves an almost linear speedup, especially for high-dimensional sparse datasets.

READ FULL TEXT
research
04/12/2018

Asynch-SGBDT: Asynchronous Parallel Stochastic Gradient Boosting Decision Tree based on Parameters Server

Gradient Boosting Decision Tree, i.e. GBDT, becomes one of the most impo...
research
11/23/2022

SketchBoost: Fast Gradient Boosted Decision Tree for Multioutput Problems

Gradient Boosted Decision Tree (GBDT) is a widely-used machine learning ...
research
10/17/2018

A Parallel Random Forest Algorithm for Big Data in a Spark Cloud Computing Environment

With the emergence of the big data age, the issue of how to obtain valua...
research
11/20/2018

Variance Suppression: Balanced Training Process in Deep Learning

Stochastic gradient descent updates parameters with summation gradient c...
research
12/05/2019

RoNGBa: A Robustly Optimized Natural Gradient Boosting Training Approach with Leaf Number Clipping

Natural gradient has been recently introduced to the field of boosting t...
research
05/15/2021

Drill the Cork of Information Bottleneck by Inputting the Most Important Data

Deep learning has become the most powerful machine learning tool in the ...
research
06/18/2022

PHN: Parallel heterogeneous network with soft gating for CTR prediction

The Click-though Rate (CTR) prediction task is a basic task in recommend...

Please sign up or login with your details

Forgot password? Click here to reset