CoCoLM: COmplex COmmonsense Enhanced Language Model

by   Changlong Yu, et al.

Large-scale pre-trained language models have demonstrated strong knowledge representation ability. However, recent studies suggest that even though these giant models contains rich simple commonsense knowledge (e.g., bird can fly and fish can swim.), they often struggle with the complex commonsense knowledge that involves multiple eventualities (verb-centric phrases, e.g., identifying the relationship between “Jim yells at Bob” and “Bob is upset”).To address this problem, in this paper, we propose to help pre-trained language models better incorporate complex commonsense knowledge. Different from existing fine-tuning approaches, we do not focus on a specific task and propose a general language model named CoCoLM. Through the careful training over a large-scale eventuality knowledge graphs ASER, we successfully teach pre-trained language models (i.e., BERT and RoBERTa) rich complex commonsense knowledge among eventualities. Experiments on multiple downstream commonsense tasks that requires the correct understanding of eventualities demonstrate the effectiveness of CoCoLM.


page 1

page 2

page 3

page 4


Enhancing Language Models with Plug-and-Play Large-Scale Commonsense

We study how to enhance language models (LMs) with textual commonsense k...

Does BERT Solve Commonsense Task via Commonsense Knowledge?

The success of pre-trained contextualized language models such as BERT m...

Language Models are Open Knowledge Graphs

This paper shows how to construct knowledge graphs (KGs) from pre-traine...

Do Language Models Perform Generalizable Commonsense Inference?

Inspired by evidence that pretrained language models (LMs) encode common...

It’s Common Sense, isn’t it? Demystifying Human Evaluations in Commonsense-enhanced NLG systems

Common sense is an integral part of human cognition which allows us to m...

GeoMLAMA: Geo-Diverse Commonsense Probing on Multilingual Pre-Trained Language Models

Recent work has shown that Pre-trained Language Models (PLMs) have the a...

Mining Logical Event Schemas From Pre-Trained Language Models

We present NESL (the Neuro-Episodic Schema Learner), an event schema lea...