Does Pre-training Induce Systematic Inference? How Masked Language Models Acquire Commonsense Knowledge

12/16/2021
by   Ian Porada, et al.
0

Transformer models pre-trained with a masked-language-modeling objective (e.g., BERT) encode commonsense knowledge as evidenced by behavioral probes; however, the extent to which this knowledge is acquired by systematic inference over the semantics of the pre-training corpora is an open question. To answer this question, we selectively inject verbalized knowledge into the minibatches of a BERT model during pre-training and evaluate how well the model generalizes to supported inferences. We find generalization does not improve over the course of pre-training, suggesting that commonsense knowledge is acquired from surface-level, co-occurrence patterns rather than induced, systematic reasoning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/19/2019

Align, Mask and Select: A Simple Method for Incorporating Commonsense Knowledge into Language Representation Models

Neural language representation models such as Bidirectional Encoder Repr...
research
09/06/2021

Enhancing Language Models with Plug-and-Play Large-Scale Commonsense

We study how to enhance language models (LMs) with textual commonsense k...
research
03/20/2022

How does the pre-training objective affect what large language models learn about linguistic properties?

Several pre-training objectives, such as masked language modeling (MLM),...
research
10/12/2021

ALL Dolphins Are Intelligent and SOME Are Friendly: Probing BERT for Nouns' Semantic Properties and their Prototypicality

Large scale language models encode rich commonsense knowledge acquired t...
research
03/17/2023

Trained on 100 million words and still in shape: BERT meets British National Corpus

While modern masked language models (LMs) are trained on ever larger cor...
research
10/06/2022

Multiview Contextual Commonsense Inference: A New Dataset and Task

Contextual commonsense inference is the task of generating various types...
research
07/31/2019

What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models

Pre-training by language modeling has become a popular and successful ap...

Please sign up or login with your details

Forgot password? Click here to reset