S-GBDT: Frugal Differentially Private Gradient Boosting Decision Trees

by   Moritz Kirsche, et al.

Privacy-preserving learning of gradient boosting decision trees (GBDT) has the potential for strong utility-privacy tradeoffs for tabular data, such as census data or medical meta data: classical GBDT learners can extract non-linear patterns from small sized datasets. The state-of-the-art notion for provable privacy-properties is differential privacy, which requires that the impact of single data points is limited and deniable. We introduce a novel differentially private GBDT learner and utilize four main techniques to improve the utility-privacy tradeoff. (1) We use an improved noise scaling approach with tighter accounting of privacy leakage of a decision tree leaf compared to prior work, resulting in noise that in expectation scales with O(1/n), for n data points. (2) We integrate individual Rényi filters to our method to learn from data points that have been underutilized during an iterative training process, which – potentially of independent interest – results in a natural yet effective insight to learning streams of non-i.i.d. data. (3) We incorporate the concept of random decision tree splits to concentrate privacy budget on learning leaves. (4) We deploy subsampling for privacy amplification. Our evaluation shows for the Abalone dataset (<4k training data points) a R^2-score of 0.39 for ε=0.15, which the closest prior work only achieved for ε=10.0. On the Adult dataset (50k training data points) we achieve test error of 18.7 % for ε=0.07 which the closest prior work only achieved for ε=1.0. For the Abalone dataset for ε=0.54 we achieve R^2-score of 0.47 which is very close to the R^2-score of 0.54 for the nonprivate version of GBDT. For the Adult dataset for ε=0.54 we achieve test error 17.1 % which is very close to the test error 13.7 % of the nonprivate version of GBDT.


Differentially-Private Decision Trees with Probabilistic Robustness to Data Poisoning

Decision trees are interpretable models that are well-suited to non-line...

Private Boosted Decision Trees via Smooth Re-Weighting

Protecting the privacy of people whose data is used by machine learning ...

Privacy-Preserving Gradient Boosting Decision Trees

The Gradient Boosting Decision Tree (GBDT) is a popular machine learning...

Scalable and Provably Accurate Algorithms for Differentially Private Distributed Decision Tree Learning

This paper introduces the first provably accurate algorithms for differe...

Differentially- and non-differentially-private random decision trees

We consider supervised learning with random decision trees, where the tr...

Differentially Private Shapley Values for Data Evaluation

The Shapley value has been proposed as a solution to many applications i...

Boosted and Differentially Private Ensembles of Decision Trees

Boosted ensemble of decision tree (DT) classifiers are extremely popular...

Please sign up or login with your details

Forgot password? Click here to reset