Multilingual Fact Linking

09/29/2021
by   Keshav Kolluru, et al.
0

Knowledge-intensive NLP tasks can benefit from linking natural language text with facts from a Knowledge Graph (KG). Although facts themselves are language-agnostic, the fact labels (i.e., language-specific representation of the fact) in the KG are often present only in a few languages. This makes it challenging to link KG facts to sentences in languages other than the limited set of languages. To address this problem, we introduce the task of Multilingual Fact Linking (MFL) where the goal is to link fact expressed in a sentence to corresponding fact in the KG, even when the fact label in the KG is not available in the language of the sentence. To facilitate research in this area, we present a new evaluation dataset, IndicLink. This dataset contains 11,293 linked WikiData facts and 6,429 sentences spanning English and six Indian languages. We propose a Retrieval+Generation model, ReFCoG, that can scale to millions of KG facts by combining Dual Encoder based retrieval with a Seq2Seq based generation model which is constrained to output only valid KG facts. ReFCoG outperforms standard Retrieval+Re-ranking models by 10.7 pts in Precision@1. In spite of this gain, the model achieves an overall score of 52.1, showing ample scope for improvement in the task.ReFCoG code and IndicLink data are available at https://github.com/SaiKeshav/mfl

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/21/2023

Direct Fact Retrieval from Knowledge Graphs without Entity Linking

There has been a surge of interest in utilizing Knowledge Graphs (KGs) f...
research
03/02/2022

Mukayese: Turkish NLP Strikes Back

Having sufficient resources for language X lifts it from the under-resou...
research
11/22/2021

Knowledge Based Multilingual Language Model

Knowledge enriched language representation learning has shown promising ...
research
04/16/2023

Syntactic Complexity Identification, Measurement, and Reduction Through Controlled Syntactic Simplification

Text simplification is one of the domains in Natural Language Processing...
research
06/03/2021

A Dataset and Baselines for Multilingual Reply Suggestion

Reply suggestion models help users process emails and chats faster. Prev...
research
10/25/2022

KnowGL: Knowledge Generation and Linking from Text

We propose KnowGL, a tool that allows converting text into structured re...
research
05/24/2023

Mitigating Temporal Misalignment by Discarding Outdated Facts

While large language models are able to retain vast amounts of world kno...

Please sign up or login with your details

Forgot password? Click here to reset