Leveraging Sparse Linear Layers for Debuggable Deep Networks

05/11/2021
by   Eric Wong, et al.
12

We show how fitting sparse linear models over learned deep feature representations can lead to more debuggable neural networks. These networks remain highly accurate while also being more amenable to human interpretation, as we demonstrate quantiatively via numerical and human experiments. We further illustrate how the resulting sparse explanations can help to identify spurious correlations, explain misclassifications, and diagnose model biases in vision and language tasks. The code for our toolkit can be found at https://github.com/madrylab/debuggabledeepnetworks.

READ FULL TEXT

page 19

page 26

page 27

page 28

page 29

page 31

page 38

page 42

research
02/02/2023

The Contextual Lasso: Sparse Linear Models via Deep Neural Networks

Sparse linear models are a gold standard tool for interpretable machine ...
research
11/24/2021

Graph Modularity: Towards Understanding the Cross-Layer Transition of Feature Representations in Deep Neural Networks

There are good arguments to support the claim that feature representatio...
research
06/03/2019

Learning Perceptually-Aligned Representations via Adversarial Robustness

Many applications of machine learning require models that are human-alig...
research
04/22/2022

Learning to Scaffold: Optimizing Model Explanations for Teaching

Modern machine learning models are opaque, and as a result there is a bu...
research
10/24/2022

Revisiting Sparse Convolutional Model for Visual Recognition

Despite strong empirical performance for image classification, deep neur...
research
08/13/2018

Learning Explanations from Language Data

PatternAttribution is a recent method, introduced in the vision domain, ...
research
09/26/2019

Deep Model Transferability from Attribution Maps

Exploring the transferability between heterogeneous tasks sheds light on...

Please sign up or login with your details

Forgot password? Click here to reset