Communication-Efficient Weighted Sampling and Quantile Summary for GBDT

09/17/2019
by   Ziyue Huang, et al.
0

Gradient boosting decision tree (GBDT) is a powerful and widely-used machine learning model, which has achieved state-of-the-art performance in many academic areas and production environment. However, communication overhead is the main bottleneck in distributed training which can handle the massive data nowadays. In this paper, we propose two novel communication-efficient methods over distributed dataset to mitigate this problem, a weighted sampling approach by which we can estimate the information gain over a small subset efficiently, and distributed protocols for weighted quantile problem used in approximate tree learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/24/2019

Communication-Efficient (Weighted) Reservoir Sampling

We consider communication-efficient weighted and unweighted (uniform) ra...
research
10/17/2019

WOTBoost: Weighted Oversampling Technique in Boosting for imbalanced learning

Machine learning classifiers often stumble over imbalanced datasets wher...
research
11/23/2022

SketchBoost: Fast Gradient Boosted Decision Tree for Multioutput Problems

Gradient Boosted Decision Tree (GBDT) is a widely-used machine learning ...
research
01/30/2020

Weighted Gradient Coding with Leverage Score Sampling

A major hurdle in machine learning is scalability to massive datasets. A...
research
12/05/2019

RoNGBa: A Robustly Optimized Natural Gradient Boosting Training Approach with Leaf Number Clipping

Natural gradient has been recently introduced to the field of boosting t...
research
08/13/2020

Training Faster with Compressed Gradient

Although the distributed machine learning methods show the potential for...
research
01/23/2019

Approximate k-Cover in Hypergraphs: Efficient Algorithms, and Applications

Given a weighted hypergraph H(V, E⊆ 2^V, w), the approximate k-cover pro...

Please sign up or login with your details

Forgot password? Click here to reset