SKILL: Structured Knowledge Infusion for Large Language Models

05/17/2022
by   Fedor Moiseev, et al.
0

Large language models (LLMs) have demonstrated human-level performance on a vast spectrum of natural language tasks. However, it is largely unexplored whether they can better internalize knowledge from a structured data, such as a knowledge graph, or from text. In this work, we propose a method to infuse structured knowledge into LLMs, by directly training T5 models on factual triples of knowledge graphs (KGs). We show that models pre-trained on Wikidata KG with our method outperform the T5 baselines on FreebaseQA and WikiHop, as well as the Wikidata-answerable subset of TriviaQA and NaturalQuestions. The models pre-trained on factual triples compare competitively with the ones on natural language sentences that contain the same knowledge. Trained on a smaller size KG, WikiMovies, we saw 3x improvement of exact match score on MetaQA task compared to T5 baseline. The proposed method has an advantage that no alignment between the knowledge graph and text corpus is required in curating training data. This makes our method particularly useful when working with industry-scale knowledge graphs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/29/2020

Exploiting Structured Knowledge in Text via Graph-Guided Representation Learning

In this work, we aim at equipping pre-trained language models with struc...
research
04/25/2023

What's in a Name? Evaluating Assembly-Part Semantic Knowledge in Language Models through User-Provided Names in CAD Files

Semantic knowledge of part-part and part-whole relationships in assembli...
research
06/05/2023

Text-To-KG Alignment: Comparing Current Methods on Classification Tasks

In contrast to large text corpora, knowledge graphs (KG) provide dense a...
research
09/19/2022

Joint Language Semantic and Structure Embedding for Knowledge Graph Completion

The task of completing knowledge triplets has broad downstream applicati...
research
10/23/2020

Large Scale Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced Language Model Pre-training

Generating natural sentences from Knowledge Graph (KG) triples, known as...
research
03/09/2021

BERTese: Learning to Speak to BERT

Large pre-trained language models have been shown to encode large amount...
research
08/28/2023

Bridging the KB-Text Gap: Leveraging Structured Knowledge-aware Pre-training for KBQA

Knowledge Base Question Answering (KBQA) aims to answer natural language...

Please sign up or login with your details

Forgot password? Click here to reset