Knowledge Based Multilingual Language Model

11/22/2021
by   Linlin Liu, et al.
2

Knowledge enriched language representation learning has shown promising performance across various knowledge-intensive NLP tasks. However, existing knowledge based language models are all trained with monolingual knowledge graph data, which limits their application to more languages. In this work, we present a novel framework to pretrain knowledge based multilingual language models (KMLMs). We first generate a large amount of code-switched synthetic sentences and reasoning-based multilingual training data using the Wikidata knowledge graphs. Then based on the intra- and inter-sentence structures of the generated data, we design pretraining tasks to facilitate knowledge learning, which allows the language models to not only memorize the factual knowledge but also learn useful logical patterns. Our pretrained KMLMs demonstrate significant performance improvements on a wide range of knowledge-intensive cross-lingual NLP tasks, including named entity recognition, factual knowledge retrieval, relation classification, and a new task designed by us, namely, logic reasoning. Our code and pretrained language models will be made publicly available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/15/2021

mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models

Recent studies have shown that multilingual pretrained language models c...
research
07/21/2022

Leveraging Natural Supervision for Language Representation Learning and Generation

Recent breakthroughs in Natural Language Processing (NLP) have been driv...
research
12/31/2020

UNKs Everywhere: Adapting Multilingual Language Models to New Scripts

Massively multilingual language models such as multilingual BERT (mBERT)...
research
09/29/2021

Multilingual Fact Linking

Knowledge-intensive NLP tasks can benefit from linking natural language ...
research
07/19/2022

QuoteKG: A Multilingual Knowledge Graph of Quotes

Quotes of public figures can mark turning points in history. A quote can...
research
10/24/2022

Adapters for Enhanced Modeling of Multilingual Knowledge and Text

Large language models appear to learn facts from the large text corpora ...
research
05/18/2023

The Web Can Be Your Oyster for Improving Large Language Models

Large language models (LLMs) encode a large amount of world knowledge. H...

Please sign up or login with your details

Forgot password? Click here to reset