XRand: Differentially Private Defense against Explanation-Guided Attacks

12/08/2022
by   Truc Nguyen, et al.
0

Recent development in the field of explainable artificial intelligence (XAI) has helped improve trust in Machine-Learning-as-a-Service (MLaaS) systems, in which an explanation is provided together with the model prediction in response to each query. However, XAI also opens a door for adversaries to gain insights into the black-box models in MLaaS, thereby making the models more vulnerable to several attacks. For example, feature-based explanations (e.g., SHAP) could expose the top important features that a black-box model focuses on. Such disclosure has been exploited to craft effective backdoor triggers against malware classifiers. To address this trade-off, we introduce a new concept of achieving local differential privacy (LDP) in the explanations, and from that we establish a defense, called XRand, against such attacks. We show that our mechanism restricts the information that the adversary can learn about the top important features, while maintaining the faithfulness of the explanations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/16/2020

Model Explanations with Differential Privacy

Black-box machine learning models are used in critical decision-making d...
research
04/20/2022

Backdooring Explainable Machine Learning

Explainable machine learning holds great potential for analyzing and und...
research
08/06/2021

Interpretable Summaries of Black Box Incident Triaging with Subgroup Discovery

The need of predictive maintenance comes with an increasing number of in...
research
06/28/2022

On the amplification of security and privacy risks by post-hoc explanations in machine learning models

A variety of explanation methods have been proposed in recent years to h...
research
02/15/2023

Streamlining models with explanations in the learning loop

Several explainable AI methods allow a Machine Learning user to get insi...
research
06/29/2022

Private Graph Extraction via Feature Explanations

Privacy and interpretability are two of the important ingredients for ac...
research
08/04/2022

Differentially Private Counterfactuals via Functional Mechanism

Counterfactual, serving as one emerging type of model explanation, has a...

Please sign up or login with your details

Forgot password? Click here to reset