The Multilingual TEDx Corpus for Speech Recognition and Translation

02/02/2021
by   Elizabeth Salesky, et al.
7

We present the Multilingual TEDx corpus, built to support speech recognition (ASR) and speech translation (ST) research across many non-English source languages. The corpus is a collection of audio recordings from TEDx talks in 8 source languages. We segment transcripts into sentences and align them to the source-language audio and target-language translations. The corpus is released along with open-sourced code enabling extension to new talks and languages as they become available. Our corpus creation methodology can be applied to more languages than previous work, and creates multi-way parallel evaluation sets. We provide baselines in multiple ASR and ST settings, including multilingual models to improve translation performance for low-resource language pairs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/20/2020

CoVoST 2 and Massively Multilingual Speech-to-Text Translation

Speech translation has recently become an increasingly popular topic of ...
research
05/25/2022

FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech

We introduce FLEURS, the Few-shot Learning Evaluation of Universal Repre...
research
01/11/2022

CVSS Corpus and Massively Multilingual Speech-to-Speech Translation

We introduce CVSS, a massively multilingual-to-English speech-to-speech ...
research
08/02/2019

Multilingual Speech Recognition with Corpus Relatedness Sampling

Multilingual acoustic models have been successfully applied to low-resou...
research
03/07/2022

Creating Speech-to-Speech Corpus from Dubbed Series

Dubbed series are gaining a lot of popularity in recent years with stron...
research
07/30/2019

MaSS: A Large and Clean Multilingual Corpus of Sentence-aligned Spoken Utterances Extracted from the Bible

The CMU Wilderness Multilingual Speech Dataset is a newly published mult...
research
03/13/2017

A Visual Representation of Wittgenstein's Tractatus Logico-Philosophicus

In this paper we present a data visualization method together with its p...

Please sign up or login with your details

Forgot password? Click here to reset