Robust Learning from Explanations

03/11/2023
by   Juyeon Heo, et al.
0

Machine learning from explanations (MLX) is an approach to learning that uses human-provided annotations of relevant features for each input to ensure that model predictions are right for the right reasons. Existing MLX approaches rely heavily on a specific model interpretation approach and require strong parameter regularization to align model and human explanations, leading to sub-optimal performance. We recast MLX as an adversarial robustness problem, where human explanations specify a lower dimensional manifold from which perturbations can be drawn, and show both theoretically and empirically how this approach alleviates the need for strong parameter regularization. We consider various approaches to achieving robustness, leading to improved performance over prior MLX methods. Finally, we combine robustness with an earlier MLX method, yielding state-of-the-art results on both synthetic and real-world benchmarks.

READ FULL TEXT

page 4

page 6

page 7

page 18

research
01/10/2023

Manifold Restricted Interventional Shapley Values

Shapley values are model-agnostic methods for explaining model predictio...
research
05/27/2019

Analyzing the Interpretability Robustness of Self-Explaining Models

Recently, interpretable models called self-explaining models (SEMs) have...
research
01/13/2020

Consumer-Driven Explanations for Machine Learning Decisions: An Empirical Study of Robustness

Many proposed methods for explaining machine learning predictions are in...
research
03/21/2023

Using Explanations to Guide Models

Deep neural networks are highly performant, but might base their decisio...
research
09/21/2020

Machine Guides, Human Supervises: Interactive Learning with Global Explanations

We introduce explanatory guided learning (XGL), a novel interactive lear...
research
05/17/2022

Monotonicity Regularization: Improved Penalties and Novel Applications to Disentangled Representation Learning and Robust Classification

We study settings where gradient penalties are used alongside risk minim...
research
03/18/2021

Refining Neural Networks with Compositional Explanations

Neural networks are prone to learning spurious correlations from biased ...

Please sign up or login with your details

Forgot password? Click here to reset