Jejueo Datasets for Machine Translation and Speech Synthesis

11/27/2019
by   Kyubyong Park, et al.
0

Jejueo was classified as critically endangered by UNESCO in 2010. Although diverse efforts to revitalize it have been made, there have been few computational approaches. Motivated by this, we construct two new Jejueo datasets: Jejueo Interview Transcripts (JIT) and Jejueo Single Speaker Speech (JSS). The JIT dataset is a parallel corpus containing 170k+ Jejueo-Korean sentences, and the JSS dataset consists of 10k high-quality audio files recorded by a native Jejueo speaker and a transcript file. Subsequently, we build neural systems of machine translation and speech synthesis using them. All resources are publicly available via our GitHub repository. We hope that these datasets will attract interest of both language and machine learning communities.

READ FULL TEXT
research
08/30/2023

Speech Wikimedia: A 77 Language Multilingual Speech Dataset

The Speech Wikimedia Dataset is a publicly available compilation of audi...
research
05/31/2022

Preparing an Endangered Language for the Digital Age: The Case of Judeo-Spanish

We develop machine translation and speech synthesis systems to complemen...
research
03/27/2019

CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages

We describe our development of CSS10, a collection of single speaker spe...
research
05/21/2023

VAKTA-SETU: A Speech-to-Speech Machine Translation Service in Select Indic Languages

In this work, we present our deployment-ready Speech-to-Speech Machine T...
research
05/26/2023

BIG-C: a Multimodal Multi-Purpose Dataset for Bemba

We present BIG-C (Bemba Image Grounded Conversations), a large multimoda...
research
03/19/2021

Congolese Swahili Machine Translation for Humanitarian Response

In this paper we describe our efforts to make a bidirectional Congolese ...
research
01/29/2015

Implementation of an Automatic Syllabic Division Algorithm from Speech Files in Portuguese Language

A new algorithm for voice automatic syllabic splitting in the Portuguese...

Please sign up or login with your details

Forgot password? Click here to reset