Boosted and Differentially Private Ensembles of Decision Trees

01/26/2020
by   Richard Nock, et al.
17

Boosted ensemble of decision tree (DT) classifiers are extremely popular in international competitions, yet to our knowledge nothing is formally known on how to make them also differential private (DP), up to the point that random forests currently reign supreme in the DP stage. Our paper starts with the proof that the privacy vs boosting picture for DT involves a notable and general technical tradeoff: the sensitivity tends to increase with the boosting rate of the loss, for any proper loss. DT induction algorithms being fundamentally iterative, our finding implies non-trivial choices to select or tune the loss to balance noise against utility to split nodes. To address this, we craft a new parametererized proper loss, called the Mα-loss, which, as we show, allows to finely tune the tradeoff in the complete spectrum of sensitivity vs boosting guarantees. We then introduce objective calibration as a method to adaptively tune the tradeoff during DT induction to limit the privacy budget spent while formally being able to keep boosting-compliant convergence on limited-depth nodes with high probability. Extensive experiments on 19 UCI domains reveal that objective calibration is highly competitive, even in the DP-free setting. Our approach tends to very significantly beat random forests, in particular on high DP regimes (ε≤ 0.1) and even with boosted ensembles containing ten times less trees, which could be crucial to keep a key feature of DT models under differential privacy: interpretability.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/29/2022

Private Boosted Decision Trees via Smooth Re-Weighting

Protecting the privacy of people whose data is used by machine learning ...
research
11/11/2019

Privacy-Preserving Gradient Boosting Decision Trees

The Gradient Boosting Decision Tree (GBDT) is a popular machine learning...
research
06/27/2022

Normalized/Clipped SGD with Perturbation for Differentially Private Non-Convex Optimization

By ensuring differential privacy in the learning algorithms, one can rig...
research
06/15/2020

Differentially Private Median Forests for Regression and Classification

Random forests are a popular method for classification and regression du...
research
09/21/2023

S-GBDT: Frugal Differentially Private Gradient Boosting Decision Trees

Privacy-preserving learning of gradient boosting decision trees (GBDT) h...
research
12/19/2020

Scalable and Provably Accurate Algorithms for Differentially Private Distributed Decision Tree Learning

This paper introduces the first provably accurate algorithms for differe...
research
06/08/2023

Boosting with Tempered Exponential Measures

One of the most popular ML algorithms, AdaBoost, can be derived from the...

Please sign up or login with your details

Forgot password? Click here to reset