Preparing an Endangered Language for the Digital Age: The Case of Judeo-Spanish

05/31/2022
by   Alp Öktem, et al.
0

We develop machine translation and speech synthesis systems to complement the efforts of revitalizing Judeo-Spanish, the exiled language of Sephardic Jews, which survived for centuries, but now faces the threat of extinction in the digital age. Building on resources created by the Sephardic community of Turkey and elsewhere, we create corpora and tools that would help preserve this language for future generations. For machine translation, we first develop a Spanish to Judeo-Spanish rule-based machine translation system, in order to generate large volumes of synthetic parallel data in the relevant language pairs: Turkish, English and Spanish. Then, we train baseline neural machine translation engines using this synthetic data and authentic parallel data created from translations by the Sephardic community. For text-to-speech synthesis, we present a 3.5 hour single speaker speech corpus for building a neural speech synthesis engine. Resources, model weights and online inference engines are shared publicly.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/27/2019

Jejueo Datasets for Machine Translation and Speech Synthesis

Jejueo was classified as critically endangered by UNESCO in 2010. Althou...
research
07/19/2018

Using Deep Neural Networks to Translate Multi-lingual Threat Intelligence

The multilingual nature of the Internet increases complications in the c...
research
05/21/2023

VAKTA-SETU: A Speech-to-Speech Machine Translation Service in Select Indic Languages

In this work, we present our deployment-ready Speech-to-Speech Machine T...
research
07/01/2017

Synthetic Data for Neural Machine Translation of Spoken-Dialects

In this paper, we introduce a novel approach to generate synthetic data ...
research
06/17/2021

Central Kurdish machine translation: First large scale parallel corpus and experiments

While the computational processing of Kurdish has experienced a relative...
research
07/01/2022

Building African Voices

Modern speech synthesis techniques can produce natural-sounding speech g...
research
02/15/2021

Crowdsourcing Parallel Corpus for English-Oromo Neural Machine Translation using Community Engagement Platform

Even though Afaan Oromo is the most widely spoken language in the Cushit...

Please sign up or login with your details

Forgot password? Click here to reset