Adversarial Attacks and Defenses in Explainable Artificial Intelligence: A Survey

06/06/2023
by   Hubert Baniecki, et al.
0

Explainable artificial intelligence (XAI) methods are portrayed as a remedy for debugging and trusting statistical and deep learning models, as well as interpreting their predictions. However, recent advances in adversarial machine learning highlight the limitations and vulnerabilities of state-of-the-art explanations, putting their security and trustworthiness into question. The possibility of manipulating, fooling or fairwashing evidence of the model's reasoning has detrimental consequences when applied in high-stakes decision-making and knowledge discovery. This concise survey of over 50 papers summarizes research concerning adversarial attacks on explanations of machine learning models, as well as fairness metrics. We discuss how to defend against attacks and design robust interpretation methods. We contribute a list of existing insecurities in XAI and outline the emerging research directions in adversarial XAI (AdvXAI).

READ FULL TEXT
research
03/07/2023

A Survey on Explainable Artificial Intelligence for Network Cybersecurity

The black-box nature of artificial intelligence (AI) models has been the...
research
07/17/2019

Explaining Vulnerabilities to Adversarial Machine Learning through Visual Analytics

Machine learning models are currently being deployed in a variety of rea...
research
02/05/2022

A Survey on Poisoning Attacks Against Supervised Machine Learning

With the rise of artificial intelligence and machine learning in modern ...
research
11/30/2020

Why model why? Assessing the strengths and limitations of LIME

When it comes to complex machine learning models, commonly referred to a...
research
11/16/2021

A Survey on Adversarial Attacks for Malware Analysis

Machine learning has witnessed tremendous growth in its adoption and adv...
research
12/11/2020

Dependency Decomposition and a Reject Option for Explainable Models

Deploying machine learning models in safety-related do-mains (e.g. auton...
research
03/17/2023

It Is All About Data: A Survey on the Effects of Data on Adversarial Robustness

Adversarial examples are inputs to machine learning models that an attac...

Please sign up or login with your details

Forgot password? Click here to reset