Transliteration in Any Language with Surrogate Languages

09/14/2016
by   Stephen Mayhew, et al.
0

We introduce a method for transliteration generation that can produce transliterations in every language. Where previous results are only as multilingual as Wikipedia, we show how to use training data from Wikipedia as surrogate training for any language. Thus, the problem becomes one of ranking Wikipedia languages in order of suitability with respect to a target language. We introduce several task-specific methods for ranking languages, and show that our approach is comparable to the oracle ceiling, and even outperforms it in some cases.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/13/2016

Cross-lingual Dataless Classification for Languages with Small Wikipedia Presence

This paper presents an approach to classify documents in any language in...
research
06/02/2023

Fair multilingual vandalism detection system for Wikipedia

This paper presents a novel design of the system aimed at supporting the...
research
05/15/2023

Characterizing Image Accessibility on Wikipedia across Languages

We make a first attempt to characterize image accessibility on Wikipedia...
research
04/25/2017

280 Birds with One Stone: Inducing Multilingual Taxonomies from Wikipedia using Character-level Classification

We propose a simple, yet effective, approach towards inducing multilingu...
research
11/27/2019

Sideways Transliteration: How to Transliterate Multicultural Person Names?

In a global setting, texts contain transliterated names from many cultur...
research
03/19/2018

Learning to Generate Wikipedia Summaries for Underserved Languages from Wikidata

While Wikipedia exists in 287 languages, its content is unevenly distrib...
research
01/06/2016

Wikiometrics: A Wikipedia Based Ranking System

We present a new concept - Wikiometrics - the derivation of metrics and ...

Please sign up or login with your details

Forgot password? Click here to reset