Explaining and Improving Model Behavior with k Nearest Neighbor Representations

10/18/2020
by   Nazneen Fatema Rajani, et al.
0

Interpretability techniques in NLP have mainly focused on understanding individual predictions using attention visualization or gradient-based saliency maps over tokens. We propose using k nearest neighbor (kNN) representations to identify training examples responsible for a model's predictions and obtain a corpus-level understanding of the model's behavior. Apart from interpretability, we show that kNN representations are effective at uncovering learned spurious associations, identifying mislabeled examples, and improving the fine-tuned model's performance. We focus on Natural Language Inference (NLI) as a case study and experiment with multiple datasets. Our method deploys backoff to kNN for BERT and RoBERTa on examples with low model confidence without any update to the model parameters. Our results indicate that the kNN approach makes the finetuned model more robust to adversarial inputs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/05/2022

Improving Few-Shot Performance of Language Models via Nearest Neighbor Calibration

Pre-trained language models (PLMs) have exhibited remarkable few-shot le...
research
10/01/2020

Nearest Neighbor Machine Translation

We introduce k-nearest-neighbor machine translation (kNN-MT), which pred...
research
09/24/2019

Situating Sentence Embedders with Nearest Neighbor Overlap

As distributed approaches to natural language semantics have developed a...
research
06/24/2019

An Empirical Comparison of FAISS and FENSHSES for Nearest Neighbor Search in Hamming Space

In this paper, we compare the performances of FAISS and FENSHSES on near...
research
10/25/2020

Discriminative Nearest Neighbor Few-Shot Intent Detection by Transferring Natural Language Inference

Intent detection is one of the core components of goal-oriented dialog s...
research
03/13/2018

Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning

Deep neural networks (DNNs) enable innovative applications of machine le...
research
07/25/2023

Analyzing Chain-of-Thought Prompting in Large Language Models via Gradient-based Feature Attributions

Chain-of-thought (CoT) prompting has been shown to empirically improve t...

Please sign up or login with your details

Forgot password? Click here to reset