Abstraction not Memory: BERT and the English Article System

06/08/2022
by   Harish Tayyar Madabushi, et al.
0

Article prediction is a task that has long defied accurate linguistic description. As such, this task is ideally suited to evaluate models on their ability to emulate native-speaker intuition. To this end, we compare the performance of native English speakers and pre-trained models on the task of article prediction set up as a three way choice (a/an, the, zero). Our experiments with BERT show that BERT outperforms humans on this task across all articles. In particular, BERT is far superior to humans at detecting the zero article, possibly because we insert them using rules that the deep neural model can easily pick up. More interestingly, we find that BERT tends to agree more with annotators than with the corpus when inter-annotator agreement is high but switches to agreeing more with the corpus as inter-annotator agreement drops. We contend that this alignment with annotators, despite being trained on the corpus, suggests that BERT is not memorising article use, but captures a high level generalisation of article use akin to human intuition.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/21/2022

Subject Verb Agreement Error Patterns in Meaningless Sentences: Humans vs. BERT

Both humans and neural language models are able to perform subject-verb ...
research
03/28/2021

PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS

This paper introduces PnG BERT, a new encoder model for neural TTS. This...
research
08/10/2020

KR-BERT: A Small-Scale Korean-Specific Language Model

Since the appearance of BERT, recent works including XLNet and RoBERTa u...
research
07/02/2021

He Thinks He Knows Better than the Doctors: BERT for Event Factuality Fails on Pragmatics

We investigate how well BERT performs on predicting factuality in severa...
research
09/13/2022

SkIn: Skimming-Intensive Long-Text Classification Using BERT for Medical Corpus

BERT is a widely used pre-trained model in natural language processing. ...
research
04/19/2022

Probing for the Usage of Grammatical Number

A central quest of probing is to uncover how pre-trained models encode a...
research
05/12/2020

On the Robustness of Language Encoders against Grammatical Errors

We conduct a thorough study to diagnose the behaviors of pre-trained lan...

Please sign up or login with your details

Forgot password? Click here to reset