Benchmarking Long-tail Generalization with Likelihood Splits

10/13/2022
by   Ameya Godbole, et al.
0

In order to reliably process natural language, NLP systems must generalize to the long tail of rare utterances. We propose a method to create challenging benchmarks that require generalizing to the tail of the distribution by re-splitting existing datasets. We create 'Likelihood splits' where examples that are assigned lower likelihood by a pre-trained language model (LM) are placed in the test set, and more likely examples are in the training set. This simple approach can be customized to construct meaningful train-test splits for a wide range of tasks. Likelihood splits are more challenging than random splits: relative error rates of state-of-the-art models on our splits increase by 59 SNLI, and 38 corresponding random splits. Moreover, Likelihood splits create fairer benchmarks than adversarial filtering; when the LM used to create the splits is used as the task model, our splits do not adversely penalize the LM.

READ FULL TEXT

page 17

page 18

research
05/06/2018

Breaking NLI Systems with Sentences that Require Simple Lexical Inferences

We create a new NLI test set that shows the deficiency of state-of-the-a...
research
07/20/2023

Long-Tail Theory under Gaussian Mixtures

We suggest a simple Gaussian mixture model for data generation that comp...
research
04/03/2023

Use Your Head: Improving Long-Tail Video Recognition

This paper presents an investigation into long-tail video recognition. W...
research
04/28/2020

Unnatural Language Processing: Bridging the Gap Between Synthetic and Natural Language Data

Large, human-annotated datasets are central to the development of natura...
research
12/07/2019

Adversarial Analysis of Natural Language Inference Systems

The release of large natural language inference (NLI) datasets like SNLI...
research
12/02/2020

Supertagging the Long Tail with Tree-Structured Decoding of Complex Categories

Although current CCG supertaggers achieve high accuracy on the standard ...
research
05/03/2018

The Fine Line between Linguistic Generalization and Failure in Seq2Seq-Attention Models

Seq2Seq based neural architectures have become the go-to architecture to...

Please sign up or login with your details

Forgot password? Click here to reset