Geometry matters: Exploring language examples at the decision boundary

by   Debajyoti Datta, et al.

A growing body of recent evidence has highlighted the limitations of natural language processing (NLP) datasets and classifiers. These include the presence of annotation artifacts in datasets, classifiers relying on shallow features like a single word (e.g., if a movie review has the word "romantic", the review tends to be positive), or unnecessary words (e.g., learning a proper noun to classify a movie as positive or negative). The presence of such artifacts has subsequently led to the development of challenging datasets to force the model to generalize better. While a variety of heuristic strategies, such as counterfactual examples and contrast sets, have been proposed, the theoretical justification about what makes these examples difficult is often lacking or unclear. In this paper, using tools from information geometry, we propose a theoretical way to quantify the difficulty of an example in NLP. Using our approach, we explore difficult examples for two popular NLP architectures. We discover that both BERT and CNN are susceptible to single word substitutions in high difficulty examples. Consequently, examples with low difficulty scores tend to be robust to multiple word substitutions. Our analysis shows that perturbations like contrast sets and counterfactual examples are not necessarily difficult for the model, and they may not be accomplishing the intended goal. Our approach is simple, architecture agnostic, and easily extendable to other datasets. All the code used will be made publicly available, including a tool to explore the difficult examples for other datasets.


page 1

page 2

page 3

page 4


Augmenting NLP data to counter Annotation Artifacts for NLI Tasks

In this paper, we explore Annotation Artifacts - the phenomena wherein l...

Counterfactually-Augmented SNLI Training Data Does Not Yield Better Generalization Than Unaugmented Data

A growing body of work shows that models exploit annotation artifacts to...

Multi-resolution Interpretation and Diagnostics Tool for Natural Language Classifiers

Developing explainability methods for Natural Language Processing (NLP) ...

Identifying non-natural language artifacts in bug reports

Bug reports are a popular target for natural language processing (NLP). ...

Sentence-Based Model Agnostic NLP Interpretability

Today, interpretability of Black-Box Natural Language Processing (NLP) m...

Finding Dataset Shortcuts with Grammar Induction

Many NLP datasets have been found to contain shortcuts: simple decision ...

Estimating Example Difficulty using Variance of Gradients

In machine learning, a question of great interest is understanding what ...

Please sign up or login with your details

Forgot password? Click here to reset