Trusting RoBERTa over BERT: Insights from CheckListing the Natural Language Inference Task

07/15/2021
by   Ishan Tarunesh, et al.
0

The recent state-of-the-art natural language understanding (NLU) systems often behave unpredictably, failing on simpler reasoning examples. Despite this, there has been limited focus on quantifying progress towards systems with more predictable behavior. We think that reasoning capability-wise behavioral summary is a step towards bridging this gap. We create a CheckList test-suite (184K examples) for the Natural Language Inference (NLI) task, a representative NLU task. We benchmark state-of-the-art NLI systems on this test-suite, which reveals fine-grained insights into the reasoning abilities of BERT and RoBERTa. Our analysis further reveals inconsistencies of the models on examples derived from the same template or distinct templates but pertaining to same reasoning capability, indicating that generalizing the models' behavior through observations made on a CheckList is non-trivial. Through an user-study, we find that users were able to utilize behavioral information to generalize much better for examples predicted from RoBERTa, compared to that of BERT.

READ FULL TEXT

page 5

page 12

research
12/04/2021

LoNLI: An Extensible Framework for Testing Diverse Logical Reasoning Capabilities for NLI

Natural Language Inference (NLI) is considered a representative task to ...
research
11/10/2019

Robust Natural Language Inference Models with Example Forgetting

We investigate whether example forgetting, a recently introduced measure...
research
06/03/2021

BERT meets LIWC: Exploring State-of-the-Art Language Models for Predicting Communication Behavior in Couples' Conflict Interactions

Many processes in psychology are complex, such as dyadic interactions be...
research
09/16/2021

Does External Knowledge Help Explainable Natural Language Inference? Automatic Evaluation vs. Human Ratings

Natural language inference (NLI) requires models to learn and apply comm...
research
02/09/2021

Statistically Profiling Biases in Natural Language Reasoning Datasets and Models

Recent work has indicated that many natural language understanding and r...
research
04/12/2022

NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks

Given the ubiquitous nature of numbers in text, reasoning with numbers t...
research
07/10/2023

AmadeusGPT: a natural language interface for interactive animal behavioral analysis

The process of quantifying and analyzing animal behavior involves transl...

Please sign up or login with your details

Forgot password? Click here to reset