Training Robust Tree Ensembles for Security

12/03/2019
by   Yizheng Chen, et al.
0

Tree ensemble models including random forests and gradient boosted decision trees, are widely used as security classifiers to detect malware, phishing, scam, social engineering, etc. However, the robustness of tree ensembles has not been thoroughly studied. Existing approaches mainly focus on adding more robust features and conducting feature ablation study, which do not provide robustness guarantee against strong adversaries. In this paper, we propose a new algorithm to train robust tree ensembles. Robust training maximizes the defender's gain as if the adversary is trying to minimize that. We design a general algorithm based on greedy heuristic to find better solutions to the minimization problem than previous work. We implement the algorithm for gradient boosted decision trees in xgboost and random forests in scikit-learn. Our evaluation over benchmark datasets show that, we can train more robust models than the start-of-the-art robust training algorithm in gradient boosted decision trees, with a 1.26X increase in the L_∞ evasion distance required for the strongest whitebox attacker. In addition, our algorithm is general across different gain metrics and types of tree ensembles. We achieve 3.32X increase in L_∞ robustness distance compared to the baseline random forest training method. Furthermore, to make the robustness increase meaningful in security applications, we propose attack-cost-driven constraints for the robust training process. Our training algorithm maximizes attacker's evasion cost by integrating domain knowledge about feature manipulation costs. We use twitter spam detection as a case study to analyze attacker's cost increase to evade our robust model. Our technique can train robust model to rank robust features as most important ones, and our robust model requires about 8.4X increase in attacker's economic cost to be evaded compared to the baseline.

READ FULL TEXT
research
06/07/2023

On Computing Optimal Tree Ensembles

Random forests and, more generally, (decision-)tree ensembles are widely...
research
12/21/2020

Genetic Adversarial Training of Decision Trees

We put forward a novel learning methodology for ensembles of decision tr...
research
10/22/2020

An Efficient Adversarial Attack for Tree Ensembles

We study the problem of efficient adversarial attacks on tree based ense...
research
02/19/2018

Finding Influential Training Samples for Gradient Boosted Decision Trees

We address the problem of finding influential training samples for a par...
research
10/29/2022

Robust Boosting Forests with Richer Deep Feature Hierarchy

We propose a robust variant of boosting forest to the various adversaria...
research
03/13/2021

Robust Model Compression Using Deep Hypotheses

Machine Learning models should ideally be compact and robust. Compactnes...

Please sign up or login with your details

Forgot password? Click here to reset