Lossless (and Lossy) Compression of Random Forests

10/26/2018
by   Amichai Painsky, et al.
0

Ensemble methods are among the state-of-the-art predictive modeling approaches. Applied to modern big data, these methods often require a large number of sub-learners, where the complexity of each learner typically grows with the size of the dataset. This phenomenon results in an increasing demand for storage space, which may be very costly. This problem mostly manifests in a subscriber based environment, where a user-specific ensemble needs to be stored on a personal device with strict storage limitations (such as a cellular device). In this work we introduce a novel method for lossless compression of tree-based ensemble methods, focusing on random forests. Our suggested method is based on probabilistic modeling of the ensemble's trees, followed by model clustering via Bregman divergence. This allows us to find a minimal set of models that provides an accurate description of the trees, and at the same time is small enough to store and maintain. Our compression scheme demonstrates high compression rates on a variety of modern datasets. Importantly, our scheme enables predictions from the compressed format and a perfect reconstruction of the original ensemble. In addition, we introduce a theoretically sound lossy compression scheme, which allows us to control the trade-off between the distortion and the coding rate.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/19/2021

Improving the Accuracy-Memory Trade-Off of Random Forests Via Leaf-Refinement

Random Forests (RF) are among the state-of-the-art in many machine learn...
research
06/25/2020

Fine granularity access in interactive compression of 360-degree images based on rate-adaptive channel codes

In this paper, we propose a new interactive compression scheme for omnid...
research
12/17/2013

Markov Network Structure Learning via Ensemble-of-Forests Models

Real world systems typically feature a variety of different dependency t...
research
09/18/2023

On Random Tree Structures, Their Entropy, and Compression

Measuring the complexity of tree structures can be beneficial in areas t...
research
10/17/2019

EvoZip: Efficient Compression of Large Collections of Evolutionary Trees

Phylogenetic trees represent evolutionary relationships among sets of or...
research
07/19/2018

Hybrid scene Compression for Visual Localization

Localizing an image wrt. a large scale 3D scene represents a core task f...
research
02/20/2020

Storage Space Allocation Strategy for Digital Data with Message Importance

This paper mainly focuses on the problem of lossy compression storage fr...

Please sign up or login with your details

Forgot password? Click here to reset