Probing Neural Network Comprehension of Natural Language Arguments

07/17/2019
by   Timothy Niven, et al.
0

We are surprised to find that BERT's peak performance of 77 Reasoning Comprehension Task reaches just three points below the average untrained human baseline. However, we show that this result is entirely accounted for by exploitation of spurious statistical cues in the dataset. We analyze the nature of these cues and demonstrate that a range of models all exploit them. This analysis informs the construction of an adversarial dataset on which all models achieve random accuracy. Our adversarial dataset provides a more robust assessment of argument comprehension and should be adopted as the standard in future work.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/04/2017

The Argument Reasoning Comprehension Task

Reasoning is a crucial part of natural language argumentation. In order ...
research
11/08/2018

Implicit Argument Prediction as Reading Comprehension

Implicit arguments, which cannot be detected solely through syntactic cu...
research
11/01/2019

When Choosing Plausible Alternatives, Clever Hans can be Clever

Pretrained language models, such as BERT and RoBERTa, have shown large i...
research
04/23/2018

NLITrans at SemEval-2018 Task 12: Transfer of Semantic Knowledge for Argument Comprehension

The Argument Reasoning Comprehension Task requires significant language ...
research
05/19/2022

Are Prompt-based Models Clueless?

Finetuning large pre-trained language models with a task-specific head h...
research
05/11/2016

Machine Comprehension Based on Learning to Rank

Machine comprehension plays an essential role in NLP and has been widely...
research
05/20/2020

Comprehension and quotient structures in the language of 2-categories

Lawvere observed in his celebrated work on hyperdoctrines that the set-t...

Please sign up or login with your details

Forgot password? Click here to reset