Large Scale Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced Language Model Pre-training

10/23/2020
by   Oshin Agarwal, et al.
0

Generating natural sentences from Knowledge Graph (KG) triples, known as Data-To-Text Generation, is a task with many datasets for which numerous complex systems have been developed. However, no prior work has attempted to perform this generation at scale by converting an entire KG into natural text. In this paper, we verbalize the entire Wikidata KG, and create a KG-Text aligned corpus in the training process. We discuss the challenges in verbalizing an entire KG versus verbalizing smaller datasets. We further show that verbalizing an entire KG can be used to integrate structured and natural language data. In contrast to the many architectures that have been developed to integrate the structural differences between these two sources, our approach converts the KG into the same format as natural text allowing it to be seamlessly plugged into existing natural language systems. We evaluate this approach by augmenting the retrieval corpus in REALM and showing improvements, both on the LAMA knowledge probe and open domain QA.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/02/2021

TravelBERT: Pre-training Language Model Incorporating Domain-specific Heterogeneous Knowledge into A Unified Representation

Existing technologies expand BERT from different perspectives, e.g. desi...
research
05/17/2022

SKILL: Structured Knowledge Infusion for Large Language Models

Large language models (LLMs) have demonstrated human-level performance o...
research
04/06/2016

Improving LSTM-based Video Description with Linguistic Knowledge Mined from Text

This paper investigates how linguistic knowledge mined from large text c...
research
02/20/2021

NUBOT: Embedded Knowledge Graph With RASA Framework for Generating Semantic Intents Responses in Roman Urdu

The understanding of the human language is quantified by identifying int...
research
10/01/2019

TMLab: Generative Enhanced Model (GEM) for adversarial attacks

We present our Generative Enhanced Model (GEM) that we used to create sa...
research
06/01/2023

Explanation Graph Generation via Generative Pre-training over Synthetic Graphs

The generation of explanation graphs is a significant task that aims to ...
research
09/08/2022

Text-Free Learning of a Natural Language Interface for Pretrained Face Generators

We propose Fast text2StyleGAN, a natural language interface that adapts ...

Please sign up or login with your details

Forgot password? Click here to reset