Understanding and Enhancing Robustness of Concept-based Models

11/29/2022
by   Sanchit Sinha, et al.
0

Rising usage of deep neural networks to perform decision making in critical applications like medical diagnosis and financial analysis have raised concerns regarding their reliability and trustworthiness. As automated systems become more mainstream, it is important their decisions be transparent, reliable and understandable by humans for better trust and confidence. To this effect, concept-based models such as Concept Bottleneck Models (CBMs) and Self-Explaining Neural Networks (SENN) have been proposed which constrain the latent space of a model to represent high level concepts easily understood by domain experts in the field. Although concept-based models promise a good approach to both increasing explainability and reliability, it is yet to be shown if they demonstrate robustness and output consistent concepts under systematic perturbations to their inputs. To better understand performance of concept-based models on curated malicious samples, in this paper, we aim to study their robustness to adversarial perturbations, which are also known as the imperceptible changes to the input data that are crafted by an attacker to fool a well-learned concept-based model. Specifically, we first propose and analyze different malicious attacks to evaluate the security vulnerability of concept based models. Subsequently, we propose a potential general adversarial training-based defense mechanism to increase robustness of these systems to the proposed malicious attacks. Extensive experiments on one synthetic and two real-world datasets demonstrate the effectiveness of the proposed attacks and the defense approach.

READ FULL TEXT

page 2

page 7

page 11

page 15

page 16

research
10/28/2020

Evaluating Robustness of Predictive Uncertainty Estimation: Are Dirichlet-based Models Reliable?

Robustness to adversarial perturbations and accurate uncertainty estimat...
research
08/12/2019

Adversarial Neural Pruning

It is well known that neural networks are susceptible to adversarial per...
research
05/04/2020

Explaining AI-based Decision Support Systems using Concept Localization Maps

Human-centric explainability of AI-based Decision Support Systems (DSS) ...
research
06/17/2020

Fairness Through Robustness: Investigating Robustness Disparity in Deep Learning

Deep neural networks are being increasingly used in real world applicati...
research
06/03/2019

Adversarial Risk Bounds for Neural Networks through Sparsity based Compression

Neural networks have been shown to be vulnerable against minor adversari...
research
10/07/2022

A2: Efficient Automated Attacker for Boosting Adversarial Training

Based on the significant improvement of model robustness by AT (Adversar...
research
02/28/2021

Adversarial Information Bottleneck

The information bottleneck (IB) principle has been adopted to explain de...

Please sign up or login with your details

Forgot password? Click here to reset