LiteMORT: A memory efficient gradient boosting tree system on adaptive compact distributions

01/26/2020
by   Yingshi Chen, et al.
0

Gradient boosted decision trees (GBDT) is the leading algorithm for many commercial and academic data applications. We give a deep analysis of this algorithm, especially the histogram technique, which is a basis for the regulized distribution with compact support. We present three new modifications. 1) Share memory technique to reduce memory usage. In many cases, it only need the data source itself and no extra memory. 2) Implicit merging for "merge overflow problem"."merge overflow" means that merge some small datasets to huge datasets, which are too huge to be solved. By implicit merging, we just need the original small datasets to train the GBDT model. 3) Adaptive resize algorithm of histogram bins to improve accuracy. Experiments on two large Kaggle competitions verified our methods. They use much less memory than LightGBM and have higher accuracy. We have implemented these algorithms in an open-source package LiteMORT. The source codes are available at https://github.com/closest-git/LiteMORT

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/02/2020

Attention augmented differentiable forest for tabular data

Differentiable forest is an ensemble of decision trees with full differe...
research
10/31/2017

Compact Multi-Class Boosted Trees

Gradient boosted decision trees are a popular machine learning technique...
research
07/28/2023

ODTlearn: A Package for Learning Optimal Decision Trees for Prediction and Prescription

ODTLearn is an open-source Python package that provides methods for lear...
research
03/04/2019

Lightweight merging of compressed indices based on BWT variants

In this paper we propose a flexible and lightweight technique for mergin...
research
12/07/2021

Shrub Ensembles for Online Classification

Online learning algorithms have become a ubiquitous tool in the machine ...
research
06/26/2018

Multi-Merge Budget Maintenance for Stochastic Gradient Descent SVM Training

Budgeted Stochastic Gradient Descent (BSGD) is a state-of-the-art techni...
research
03/24/2021

Generic Merging of Structure from Motion Maps with a Low Memory Footprint

With the development of cheap image sensors, the amount of available ima...

Please sign up or login with your details

Forgot password? Click here to reset