Bootstrapping Transliteration with Constrained Discovery for Low-Resource Languages

09/20/2018
by   Shyam Upadhyay, et al.
0

Generating the English transliteration of a name written in a foreign script is an important and challenging step in multilingual knowledge acquisition and information extraction. Existing approaches to transliteration generation require a large (>5000) number of training examples. This difficulty contrasts with transliteration discovery, a somewhat easier task that involves picking a plausible transliteration from a given list. In this work, we present a bootstrapping algorithm that uses constrained discovery to improve generation, and can be used with as few as 500 training examples, which we show can be sourced from annotators in a matter of hours. This opens the task to languages for which large number of training examples are unavailable. We evaluate transliteration generation performance itself, as well the improvement it brings to cross-lingual candidate generation for entity linking, a typical downstream task. We present a comprehensive evaluation of our approach on nine languages, each written in a unique script.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/03/2020

Improving Candidate Generation for Low-resource Cross-lingual Entity Linking

Cross-lingual entity linking (XEL) is the task of finding referents in a...
research
09/29/2019

Towards Zero-resource Cross-lingual Entity Linking

Cross-lingual entity linking (XEL) grounds named entities in a source la...
research
05/21/2022

Retrieval-Augmented Multilingual Keyphrase Generation with Retriever-Generator Iterative Training

Keyphrase generation is the task of automatically predicting keyphrases ...
research
06/30/2022

Efficient Entity Candidate Generation for Low-Resource Languages

Candidate generation is a crucial module in entity linking. It also play...
research
05/11/2023

Not All Languages Are Created Equal in LLMs: Improving Multilingual Capability by Cross-Lingual-Thought Prompting

Large language models (LLMs) demonstrate impressive multilingual capabil...
research
05/02/2020

Design Challenges for Low-resource Cross-lingual Entity Linking

Cross-lingual Entity Linking (XEL) grounds mentions of entities that app...
research
04/17/2022

WhyGen: Explaining ML-powered Code Generation by Referring to Training Examples

Deep learning has demonstrated great abilities in various code generatio...

Please sign up or login with your details

Forgot password? Click here to reset