Log In Sign Up

Does BERT Solve Commonsense Task via Commonsense Knowledge?

by   Leyang Cui, et al.

The success of pre-trained contextualized language models such as BERT motivates a line of work that investigates linguistic knowledge inside such models in order to explain the huge improvement in downstream tasks. While previous work shows syntactic, semantic and word sense knowledge in BERT, little work has been done on investigating how BERT solves CommonsenseQA tasks. In particular, it is an interesting research question whether BERT relies on shallow syntactic patterns or deeper commonsense knowledge for disambiguation. We propose two attention-based methods to analyze commonsense knowledge inside BERT, and the contribution of such knowledge for the model prediction. We find that attention heads successfully capture the structured commonsense knowledge encoded in ConceptNet, which helps BERT solve commonsense tasks directly. Fine-tuning further makes BERT learn to use the commonsense knowledge on higher layers.


page 1

page 2

page 3

page 4


CoCoLM: COmplex COmmonsense Enhanced Language Model

Large-scale pre-trained language models have demonstrated strong knowled...

How is BERT surprised? Layerwise detection of linguistic anomalies

Transformer language models have shown remarkable ability in detecting w...

Evaluating Commonsense in Pre-trained Language Models

Contextualized representations trained over large raw text data have giv...

Leveraging Commonsense Knowledge on Classifying False News and Determining Checkworthiness of Claims

Widespread and rapid dissemination of false news has made fact-checking ...

BERTese: Learning to Speak to BERT

Large pre-trained language models have been shown to encode large amount...

Probing Commonsense Knowledge in Pre-trained Language Models with Sense-level Precision and Expanded Vocabulary

Progress on commonsense reasoning is usually measured from performance i...