A case study on using speech-to-translation alignments for language documentation

02/14/2017
by   Antonios Anastasopoulos, et al.
0

For many low-resource or endangered languages, spoken language resources are more likely to be annotated with translations than with transcriptions. Recent work exploits such annotations to produce speech-to-translation alignments, without access to any text transcriptions. We investigate whether providing such information can aid in producing better (mismatched) crowdsourced transcriptions, which in turn could be valuable for training speech recognition systems, and show that they can indeed be beneficial through a small-scale case study as a proof-of-concept. We also present a simple phonetically aware string averaging technique that produces transcriptions of higher quality.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/20/2020

CoVoST 2 and Massively Multilingual Speech-to-Text Translation

Speech translation has recently become an increasingly popular topic of ...
research
06/01/2017

Using of heterogeneous corpora for training of an ASR system

The paper summarizes the development of the LVCSR system built as a part...
research
03/24/2018

Low-Resource Speech-to-Text Translation

Speech-to-text translation has many potential applications for low-resou...
research
03/13/2021

OkwuGbé: End-to-End Speech Recognition for Fon and Igbo

Language is inherent and compulsory for human communication. Whether exp...
research
08/29/2019

Classifying topics in speech when all you have is crummy translations

Given a large amount of unannotated speech in a language with few resour...

Please sign up or login with your details

Forgot password? Click here to reset