Multilingual Autoregressive Entity Linking

by   Nicola De Cao, et al.

We present mGENRE, a sequence-to-sequence system for the Multilingual Entity Linking (MEL) problem – the task of resolving language-specific mentions to a multilingual Knowledge Base (KB). For a mention in a given language, mGENRE predicts the name of the target entity left-to-right, token-by-token in an autoregressive fashion. The autoregressive formulation allows us to effectively cross-encode mention string and entity names to capture more interactions than the standard dot product between mention and entity vectors. It also enables fast search within a large KB even for mentions that do not appear in mention tables and with no need for large-scale vector indices. While prior MEL works use a single representation for each entity, we match against entity names of as many languages as possible, which allows exploiting language connections between source input and target name. Moreover, in a zero-shot setting on languages with no training data at all, mGENRE treats the target language as a latent variable that is marginalized at prediction time. This leads to over 50 improvements in average accuracy. We show the efficacy of our approach through extensive evaluation including experiments on three popular MEL benchmarks where mGENRE establishes new state-of-the-art results. Code and pre-trained models at


page 2

page 6

page 8

page 16

page 18

page 20


Entity Linking in 100 Languages

We propose a new formulation for multilingual entity linking, where lang...

Autoregressive Entity Retrieval

Entities are at the center of how we represent and aggregate knowledge. ...

ParaNames: A Massively Multilingual Entity Name Corpus

This preprint describes work in progress on ParaNames, a multilingual pa...

DetIE: Multilingual Open Information Extraction Inspired by Object Detection

State of the art neural methods for open information extraction (OpenIE)...

Highly Parallel Autoregressive Entity Linking with Discriminative Correction

Generative approaches have been recently shown to be effective for both ...

Autoregressive Search Engines: Generating Substrings as Document Identifiers

Knowledge-intensive language tasks require NLP systems to both provide t...

Multilingual Event Linking to Wikidata

We present a task of multilingual linking of events to a knowledge base....