Neuron-level Interpretation of Deep NLP Models: A Survey

08/30/2021
by   Hassan Sajjad, et al.
0

The proliferation of deep neural networks in various domains has seen an increased need for interpretability of these methods. A plethora of research has been carried out to analyze and understand components of the deep neural network models. Preliminary work done along these lines and papers that surveyed such, were focused on a more high-level representation analysis. However, a recent branch of work has concentrated on interpretability at a more granular level, analyzing neurons and groups of neurons in these large models. In this paper, we survey work done on fine-grained neuron analysis including: i) methods developed to discover and understand neurons in a network, ii) their limitations and evaluation, iii) major findings including cross architectural comparison that such analyses unravel and iv) direct applications of neuron analysis such as model behavior control and domain adaptation along with potential directions for future work.

READ FULL TEXT
research
05/17/2021

Fine-grained Interpretation and Causation Analysis in Deep NLP Models

This paper is a write-up for the tutorial on "Fine-grained Interpretatio...
research
12/21/2018

NeuroX: A Toolkit for Analyzing Individual Neurons in Neural Networks

We present a toolkit to facilitate the interpretation and understanding ...
research
11/16/2022

Engineering Monosemanticity in Toy Models

In some neural networks, individual neurons correspond to natural “featu...
research
03/27/2023

Exposing the Functionalities of Neurons for Gated Recurrent Unit Based Sequence-to-Sequence Model

The goal of this paper is to report certain scientific discoveries about...
research
09/25/2019

Switched linear projections and inactive state sensitivity for deep neural network interpretability

We introduce switched linear projections for expressing the activity of ...
research
05/07/2020

NetPyNE implementation and rescaling of the Potjans-Diesmanncortical microcircuit model

The Potjans-Diesmann cortical microcircuit model is a widely used model ...
research
11/18/2020

A framework for the fine-grained evaluation of the instantaneous expected value of soccer possessions

The expected possession value (EPV) of a soccer possession represents th...

Please sign up or login with your details

Forgot password? Click here to reset