Knowledge-Aware Language Model Pretraining

06/29/2020
by   Corby Rosset, et al.
5

How much knowledge do pretrained language models hold? Recent research observed that pretrained transformers are adept at modeling semantics but it is unclear to what degree they grasp human knowledge, or how to ensure they do so. In this paper we incorporate knowledge-awareness in language model pretraining without changing the transformer architecture, inserting explicit knowledge layers, or adding external storage of semantic information. Rather, we simply signal the existence of entities to the input of the transformer in pretraining, with an entity-extended tokenizer; and at the output, with an additional entity prediction task. Our experiments show that solely by adding these entity signals in pretraining, significantly more knowledge is packed into the transformer parameters: we observe improved language modeling accuracy, factual correctness in LAMA knowledge probing tasks, and semantics in the hidden representations through edge probing.We also show that our knowledge-aware language model (KALM) can serve as a drop-in replacement for GPT-2 models, significantly improving downstream tasks like zero-shot question-answering with no task-related training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/06/2020

Pretrained Language Model Embryology: The Birth of ALBERT

While behaviors of pretrained language models (LMs) have been thoroughly...
research
12/20/2019

Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model

Recent breakthroughs of pretrained language models have shown the effect...
research
08/16/2020

Adding Recurrence to Pretrained Transformers for Improved Efficiency and Context Size

Fine-tuning a pretrained transformer for a downstream task has become a ...
research
03/09/2021

Pretrained Transformers as Universal Computation Engines

We investigate the capability of a transformer pretrained on natural lan...
research
03/25/2023

Sem4SAP: Synonymous Expression Mining From Open Knowledge Graph For Language Model Synonym-Aware Pretraining

The model's ability to understand synonymous expression is crucial in ma...
research
03/21/2019

Linguistic Knowledge and Transferability of Contextual Representations

Contextual word representations derived from large-scale neural language...
research
09/03/2022

TransPolymer: a Transformer-based Language Model for Polymer Property Predictions

Accurate and efficient prediction of polymer properties is of great sign...

Please sign up or login with your details

Forgot password? Click here to reset