No Strong Feelings One Way or Another: Re-operationalizing Neutrality in Natural Language Inference

06/16/2023
by   Animesh Nighojkar, et al.
0

Natural Language Inference (NLI) has been a cornerstone task in evaluating language models' inferential reasoning capabilities. However, the standard three-way classification scheme used in NLI has well-known shortcomings in evaluating models' ability to capture the nuances of natural human reasoning. In this paper, we argue that the operationalization of the neutral label in current NLI datasets has low validity, is interpreted inconsistently, and that at least one important sense of neutrality is often ignored. We uncover the detrimental impact of these shortcomings, which in some cases leads to annotation datasets that actually decrease performance on downstream tasks. We compare approaches of handling annotator disagreement and identify flaws in a recent NLI dataset that designs an annotator study based on a problematic operationalization. Our findings highlight the need for a more refined evaluation framework for NLI, and we hope to spark further discussion and action in the NLP community.

READ FULL TEXT
research
05/31/2023

Large Language Models Are Not Abstract Reasoners

Large Language Models have shown tremendous performance on a large varie...
research
04/24/2023

Understanding and Predicting Human Label Variation in Natural Language Inference through Explanation

Human label variation (Plank 2022), or annotation disagreement, exists i...
research
08/15/2019

Abductive Commonsense Reasoning

Abductive reasoning is inference to the most plausible explanation. For ...
research
10/17/2021

Schrödinger's Tree – On Syntax and Neural Language Models

In the last half-decade, the field of natural language processing (NLP) ...
research
09/13/2023

OYXOY: A Modern NLP Test Suite for Modern Greek

This paper serves as a foundational step towards the development of a li...
research
04/10/2021

NLI Data Sanity Check: Assessing the Effect of Data Corruption on Model Performance

Pre-trained neural language models give high performance on natural lang...
research
11/03/2019

Posing Fair Generalization Tasks for Natural Language Inference

Deep learning models for semantics are generally evaluated using natural...

Please sign up or login with your details

Forgot password? Click here to reset