TX-Ray: Quantifying and Explaining Model-Knowledge Transfer in (Un-)Supervised NLP

12/02/2019
by   Nils Rethmeier, et al.
20

While state-of-the-art NLP explainability (XAI) methods focus on supervised, per-instance end or diagnostic probing task evaluation[4, 2, 10], this is insufficient to interpret and quantify model knowledge transfer during (un-) supervised training. By instead expressing each neuron as an interpretable token-activation distribution collected over many instances, one can quantify and guide visual exploration of neuron-knowledge change between model training stages to analyze transfer beyond probing tasks and the per-instance level. This allows one to analyze: (RQ1) how neurons abstract knowledge during unsupervised pretraining; (RQ2) how pretrained neurons zero-shot transfer knowledge to new domain data; and (RQ3) how supervised tasks reorder pretrained neuron knowledge abstractions. Since the meaningfulness of XAI methods is hard to quantify [11, 4], we analyze three example learning setups (RQ1-3) to empirically verify that our method (TX-Ray): identifies transfer (ir-)relevant neurons for pruning (RQ3), and that its transfer metrics coincide with traditional measures like perplexity (RQ1). We also find, that TX-Ray guided pruning of supervision (ir-)relevant neuron-knowledge (RQ3) can identify `lottery ticket'-like [9, 40] neurons that drive model performance and robustness. Upon inspecting pruned neurons, we find that task-relevant neuron-knowledge (`tickets'), appear (over-)fit, while task-irrelevant neurons lower overfitting, i.e. TX-Ray identifies neurons that generalize, transfer or specialize model-knowledge [25]. Finally, through RQ1-3, we find that TX-Ray helps to explore and quantify dynamics of (continual) knowledge transfer and that it can shed light on neuron-knowledge specialization and generalization, to complement (costly) supervised probing task procurement and established `summary' statistics like perplexity, ROC or F scores.

READ FULL TEXT
research
05/01/2023

Interpreting Pretrained Source-code Models using Neuron Redundancy Analyses

Neural code intelligence models continue to be 'black boxes' to the huma...
research
11/08/2018

Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons

An activation boundary for a neuron refers to a separating hyperplane th...
research
04/18/2021

Knowledge Neurons in Pretrained Transformers

Large-scale pretrained language models are surprisingly good at recallin...
research
01/06/2020

Investigation and Analysis of Hyper and Hypo neuron pruning to selectively update neurons during Unsupervised Adaptation

Unseen or out-of-domain data can seriously degrade the performance of a ...
research
04/07/2021

The Emergence of Abstract and Episodic Neurons in Episodic Meta-RL

In this work, we analyze the reinstatement mechanism introduced by Ritte...
research
02/23/2020

Neuron Shapley: Discovering the Responsible Neurons

We develop Neuron Shapley as a new framework to quantify the contributio...
research
09/19/2018

Interpretable Textual Neuron Representations for NLP

Input optimization methods, such as Google Deep Dream, create interpreta...

Please sign up or login with your details

Forgot password? Click here to reset