CoLAKE: Contextualized Language and Knowledge Embedding

10/01/2020
by   Tianxiang Sun, et al.
21

With the emerging branch of incorporating factual knowledge into pre-trained language models such as BERT, most existing models consider shallow, static, and separately pre-trained entity embeddings, which limits the performance gains of these models. Few works explore the potential of deep contextualized knowledge representation when injecting knowledge. In this paper, we propose the Contextualized Language and Knowledge Embedding (CoLAKE), which jointly learns contextualized representation for both language and knowledge with the extended MLM objective. Instead of injecting only entity embeddings, CoLAKE extracts the knowledge context of an entity from large-scale knowledge bases. To handle the heterogeneity of knowledge context and language context, we integrate them in a unified data structure, word-knowledge graph (WK graph). CoLAKE is pre-trained on large-scale WK graphs with the modified Transformer encoder. We conduct experiments on knowledge-driven tasks, knowledge probing tasks, and language understanding tasks. Experimental results show that CoLAKE outperforms previous counterparts on most of the tasks. Besides, CoLAKE achieves surprisingly high performance on our synthetic task called word-knowledge graph completion, which shows the superiority of simultaneously contextualizing language and knowledge representation.

READ FULL TEXT
research
11/13/2019

KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation

Pre-trained language representation models (PLMs) learn effective langua...
research
02/04/2022

From Discrimination to Generation: Knowledge Graph Completion with Generative Transformer

Knowledge graph completion aims to address the problem of extending a KG...
research
09/02/2019

Enriching Medcial Terminology Knowledge Bases via Pre-trained Language Model and Graph Convolutional Network

Enriching existing medical terminology knowledge bases (KBs) is an impor...
research
06/17/2021

Knowledgeable or Educated Guess? Revisiting Language Models as Knowledge Bases

Previous literatures show that pre-trained masked language models (MLMs)...
research
04/25/2022

Incorporating Explicit Knowledge in Pre-trained Language Models for Passage Re-ranking

Passage re-ranking is to obtain a permutation over the candidate passage...
research
02/27/2022

A Simple but Effective Pluggable Entity Lookup Table for Pre-trained Language Models

Pre-trained language models (PLMs) cannot well recall rich factual knowl...
research
05/22/2023

Evaluating Prompt-based Question Answering for Object Prediction in the Open Research Knowledge Graph

There have been many recent investigations into prompt-based training of...

Please sign up or login with your details

Forgot password? Click here to reset