A Resource for Computational Experiments on Mapudungun

12/04/2019
by   Mingjun Duan, et al.
0

We present a resource for computational experiments on Mapudungun, a polysynthetic indigenous language spoken in Chile with upwards of 200 thousand speakers. We provide 142 hours of culturally significant conversations in the domain of medical treatment. The conversations are fully transcribed and translated into Spanish. The transcriptions also include annotations for code-switching and non-standard pronunciations. We also provide baseline results on three core NLP tasks: speech recognition, speech synthesis, and machine translation between Spanish and Mapudungun. We further explore other applications for which the corpus will be suitable, including the study of code-switching, historical orthography change, linguistic structure, and sociological and anthropological studies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/22/2022

ArzEn-ST: A Three-way Speech Translation Corpus for Code-Switched Egyptian Arabic - English

We present our work on collecting ArzEn-ST, a code-switched Egyptian Ara...
research
10/07/2016

Challenges of Computational Processing of Code-Switching

This paper addresses challenges of Natural Language Processing (NLP) on ...
research
11/20/2017

Speech recognition for medical conversations

In this paper we document our experiences with developing speech recogni...
research
07/27/2018

A small Griko-Italian speech translation corpus

This paper presents an extension to a very low-resource parallel corpus ...
research
05/26/2023

BIG-C: a Multimodal Multi-Purpose Dataset for Bemba

We present BIG-C (Bemba Image Grounded Conversations), a large multimoda...
research
10/26/2020

HarperValleyBank: A Domain-Specific Spoken Dialog Corpus

We introduce HarperValleyBank, a free, public domain spoken dialog corpu...

Please sign up or login with your details

Forgot password? Click here to reset