DeepAI AI Chat
Log In Sign Up

Boosted and Differentially Private Ensembles of Decision Trees

by   Richard Nock, et al.

Boosted ensemble of decision tree (DT) classifiers are extremely popular in international competitions, yet to our knowledge nothing is formally known on how to make them also differential private (DP), up to the point that random forests currently reign supreme in the DP stage. Our paper starts with the proof that the privacy vs boosting picture for DT involves a notable and general technical tradeoff: the sensitivity tends to increase with the boosting rate of the loss, for any proper loss. DT induction algorithms being fundamentally iterative, our finding implies non-trivial choices to select or tune the loss to balance noise against utility to split nodes. To address this, we craft a new parametererized proper loss, called the Mα-loss, which, as we show, allows to finely tune the tradeoff in the complete spectrum of sensitivity vs boosting guarantees. We then introduce objective calibration as a method to adaptively tune the tradeoff during DT induction to limit the privacy budget spent while formally being able to keep boosting-compliant convergence on limited-depth nodes with high probability. Extensive experiments on 19 UCI domains reveal that objective calibration is highly competitive, even in the DP-free setting. Our approach tends to very significantly beat random forests, in particular on high DP regimes (ε≤ 0.1) and even with boosted ensembles containing ten times less trees, which could be crucial to keep a key feature of DT models under differential privacy: interpretability.


page 1

page 2

page 3

page 4


Private Boosted Decision Trees via Smooth Re-Weighting

Protecting the privacy of people whose data is used by machine learning ...

Privacy-Preserving Gradient Boosting Decision Trees

The Gradient Boosting Decision Tree (GBDT) is a popular machine learning...

Normalized/Clipped SGD with Perturbation for Differentially Private Non-Convex Optimization

By ensuring differential privacy in the learning algorithms, one can rig...

Differentially Private Median Forests for Regression and Classification

Random forests are a popular method for classification and regression du...

Scalable and Provably Accurate Algorithms for Differentially Private Distributed Decision Tree Learning

This paper introduces the first provably accurate algorithms for differe...

Interaction Detection with Bayesian Decision Tree Ensembles

Methods based on Bayesian decision tree ensembles have proven valuable i...