Coresets for Decision Trees of Signals

10/07/2021
by   Ibrahim Jubran, et al.
0

A k-decision tree t (or k-tree) is a recursive partition of a matrix (2D-signal) into k≥ 1 block matrices (axis-parallel rectangles, leaves) where each rectangle is assigned a real label. Its regression or classification loss to a given matrix D of N entries (labels) is the sum of squared differences over every label in D and its assigned label by t. Given an error parameter ε∈(0,1), a (k,ε)-coreset C of D is a small summarization that provably approximates this loss to every such tree, up to a multiplicative factor of 1±ε. In particular, the optimal k-tree of C is a (1+ε)-approximation to the optimal k-tree of D. We provide the first algorithm that outputs such a (k,ε)-coreset for every such matrix D. The size |C| of the coreset is polynomial in klog(N)/ε, and its construction takes O(Nk) time. This is by forging a link between decision trees from machine learning – to partition trees in computational geometry. Experimental results on and show that applying our coresets on real-world data-sets boosts the computation time of random forests and their parameter tuning by up to x10, while keeping similar accuracy. Full open source code is provided.

READ FULL TEXT

page 14

page 15

page 16

research
04/14/2015

HHCART: An Oblique Decision Tree

Decision trees are a popular technique in statistical data classificatio...
research
06/15/2020

Generalized Optimal Sparse Decision Trees

Decision tree optimization is notoriously difficult from a computational...
research
12/01/2021

How Smart Guessing Strategies Can Yield Massive Scalability Improvements for Sparse Decision Tree Optimization

Sparse decision tree optimization has been one of the most fundamental p...
research
11/19/2022

On the Pointwise Behavior of Recursive Partitioning and Its Implications for Heterogeneous Causal Effect Estimation

Decision tree learning is increasingly being used for pointwise inferenc...
research
06/11/2019

Fast and Accurate Least-Mean-Squares Solvers

Least-mean squares (LMS) solvers such as Linear / Ridge / Lasso-Regressi...
research
09/09/2019

Scheduling optimization of parallel linear algebra algorithms using Supervised Learning

Linear algebra algorithms are used widely in a variety of domains, e.g m...
research
10/16/2021

Streaming Decision Trees and Forests

Machine learning has successfully leveraged modern data and provided com...

Please sign up or login with your details

Forgot password? Click here to reset