False-Friend Detection and Entity Matching via Unsupervised Transliteration

11/21/2016
by   Yanqing Chen, et al.
0

Transliterations play an important role in multilingual entity reference resolution, because proper names increasingly travel between languages in news and social media. Previous work associated with machine translation targets transliteration only single between language pairs, focuses on specific classes of entities (such as cities and celebrities) and relies on manual curation, which limits the expression power of transliteration in multilingual environment. By contrast, we present an unsupervised transliteration model covering 69 major languages that can generate good transliterations for arbitrary strings between any language pair. Our model yields top-(1, 20, 100) averages of (32.85 results from a recently-published system of (26.71 show the quality of our model in detecting true and false friends from Wikipedia high frequency lexicons. Our method indicates a strong signal of pronunciation similarity and boosts the probability of finding true friends in 68 out of 69 languages.

READ FULL TEXT

page 5

page 7

research
04/21/2020

Knowledge Distillation for Multilingual Unsupervised Neural Machine Translation

Unsupervised neural machine translation (UNMT) has recently achieved rem...
research
07/14/2021

Importance-based Neuron Allocation for Multilingual Neural Machine Translation

Multilingual neural machine translation with a single model has drawn mu...
research
11/10/2020

Translating Similar Languages: Role of Mutual Intelligibility in Multilingual Transformers

We investigate different approaches to translate between similar languag...
research
03/03/2023

Investigating the Translation Performance of a Large Multilingual Language Model: the Case of BLOOM

The NLP community recently saw the release of a new large open-access mu...
research
10/24/2018

Multi-Multi-View Learning: Multilingual and Multi-Representation Entity Typing

Knowledge bases (KBs) are paramount in NLP. We employ multiview learning...
research
10/20/2021

Improved Multilingual Language Model Pretraining for Social Media Text via Translation Pair Prediction

We evaluate a simple approach to improving zero-shot multilingual transf...
research
02/01/2023

Detecting Lexical Borrowings from Dominant Languages in Multilingual Wordlists

Language contact is a pervasive phenomenon reflected in the borrowing of...

Please sign up or login with your details

Forgot password? Click here to reset