DeepAI
Log In Sign Up

Leveraging translations for speech transcription in low-resource settings

03/23/2018
by   Antonis Anastasopoulos, et al.
0

Recently proposed data collection frameworks for endangered language documentation aim not only to collect speech in the language of interest, but also to collect translations into a high-resource language that will render the collected resource interpretable. We focus on this scenario and explore whether we can improve transcription quality under these extremely low-resource settings with the assistance of text translations. We present a neural multi-source model and evaluate several variations of it on three low-resource datasets. We find that our multi-source model with shared attention outperforms the baselines, reducing transcription character error rate by up to 12.3

READ FULL TEXT

page 1

page 2

page 3

page 4

07/20/2022

When Is TTS Augmentation Through a Pivot Language Useful?

Developing Automatic Speech Recognition (ASR) for low-resource languages...
11/14/2022

High-Resource Methodological Bias in Low-Resource Investigations

The central bottleneck for low-resource NLP is typically regarded to be ...
11/29/2022

Learnings from Technological Interventions in a Low Resource Language: Enhancing Information Access in Gondi

The primary obstacle to developing technologies for low-resource languag...
12/07/2019

Unsung Challenges of Building and Deploying Language Technologies for Low Resource Language Communities

In this paper, we examine and analyze the challenges associated with dev...
09/12/2019

CUNI System for the Building Educational Applications 2019 Shared Task: Grammatical Error Correction

In this paper, we describe our systems submitted to the Building Educati...
08/02/2019

SANTLR: Speech Annotation Toolkit for Low Resource Languages

While low resource speech recognition has attracted a lot of attention f...