Backdooring Explainable Machine Learning

04/20/2022
by   Maximilian Noppel, et al.
0

Explainable machine learning holds great potential for analyzing and understanding learning-based systems. These methods can, however, be manipulated to present unfaithful explanations, giving rise to powerful and stealthy adversaries. In this paper, we demonstrate blinding attacks that can fully disguise an ongoing attack against the machine learning model. Similar to neural backdoors, we modify the model's prediction upon trigger presence but simultaneously also fool the provided explanation. This enables an adversary to hide the presence of the trigger or point the explanation to entirely different portions of the input, throwing a red herring. We analyze different manifestations of such attacks for different explanation types in the image domain, before we resume to conduct a red-herring attack against malware classification.

READ FULL TEXT

page 1

page 6

page 7

page 8

page 9

page 12

page 16

page 17

research
12/08/2022

XRand: Differentially Private Defense against Explanation-Guided Attacks

Recent development in the field of explainable artificial intelligence (...
research
07/05/2022

"Even if ..." – Diverse Semifactual Explanations of Reject

Machine learning based decision making systems applied in safety critica...
research
09/28/2018

Explainable Black-Box Attacks Against Model-based Authentication

Establishing unique identities for both humans and end systems has been ...
research
10/19/2020

Against All Odds: Winning the Defense Challenge in an Evasion Competition with Diversification

Machine learning-based systems for malware detection operate in a hostil...
research
06/29/2022

Private Graph Extraction via Feature Explanations

Privacy and interpretability are two of the important ingredients for ac...
research
06/30/2021

Explanation-Guided Diagnosis of Machine Learning Evasion Attacks

Machine Learning (ML) models are susceptible to evasion attacks. Evasion...
research
11/08/2021

Defense Against Explanation Manipulation

Explainable machine learning attracts increasing attention as it improve...

Please sign up or login with your details

Forgot password? Click here to reset