Why Do Masked Neural Language Models Still Need Common Sense Knowledge?

by   Sunjae Kwon, et al.

Currently, contextualized word representations are learned by intricate neural network models, such as masked neural language models (MNLMs). The new representations significantly enhanced the performance in automated question answering by reading paragraphs. However, identifying the detailed knowledge trained in the MNLMs is difficult owing to numerous and intermingled parameters. This paper provides empirical but insightful analyses on the pretrained MNLMs with respect to common sense knowledge. First, we propose a test that measures what types of common sense knowledge do pretrained MNLMs understand. From the test, we observed that MNLMs partially understand various types of common sense knowledge but do not accurately understand the semantic meaning of relations. In addition, based on the difficulty of the question-answering task problems, we observed that pretrained MLM-based models are still vulnerable to problems that require common sense knowledge. We also experimentally demonstrated that we can elevate existing MNLM-based models by combining knowledge from an external common sense repository.


page 3

page 4

page 7


Why Do Neural Language Models Still Need Commonsense Knowledge to Handle Semantic Variations in Question Answering?

Many contextualized word representations are now learned by intricate ne...

Do Language Embeddings Capture Scales?

Pretrained Language Models (LMs) have been shown to possess significant ...

Does Knowledge Help General NLU? An Empirical Study

It is often observed in knowledge-centric tasks (e.g., common sense ques...

Evaluating Machine Common Sense via Cloze Testing

Language models (LMs) show state of the art performance for common sense...

Dynamic Integration of Background Knowledge in Neural NLU Systems

Common-sense or background knowledge is required to understand natural l...

Acquiring Common Sense Spatial Knowledge through Implicit Spatial Templates

Spatial understanding is a fundamental problem with wide-reaching real-w...

Common Sense or World Knowledge? Investigating Adapter-Based Knowledge Injection into Pretrained Transformers

Following the major success of neural language models (LMs) such as BERT...