Creating Speech-to-Speech Corpus from Dubbed Series

03/07/2022
by   Massa Baali, et al.
0

Dubbed series are gaining a lot of popularity in recent years with strong support from major media service providers. Such popularity is fueled by studies that showed that dubbed versions of TV shows are more popular than their subtitled equivalents. We propose an unsupervised approach to construct speech-to-speech corpus, aligned on short segment levels, to produce a parallel speech corpus in the source- and target- languages. Our methodology exploits video frames, speech recognition, machine translation, and noisy frames removal algorithms to match segments in both languages. To verify the performance of the proposed method, we apply it on long and short dubbed clips. Out of 36 hours TR-AR dubbed series, our pipeline was able to generate 17 hours of paired segments, which is about 47 language pair, EN-AR, to ensure it is robust enough and not tuned for a specific language or a specific corpus. Regardless of the language pairs, the accuracy of the paired segments was around 70 subjective evaluation. The corpus will be freely available for the research community.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/02/2021

The Multilingual TEDx Corpus for Speech Recognition and Translation

We present the Multilingual TEDx corpus, built to support speech recogni...
research
11/08/2019

Europarl-ST: A Multilingual Corpus For Speech Translation Of Parliamentary Debates

Current research into spoken language translation (SLT) is often hampere...
research
06/20/2023

HK-LegiCoST: Leveraging Non-Verbatim Transcripts for Speech Translation

We introduce HK-LegiCoST, a new three-way parallel corpus of Cantonese-E...
research
02/09/2018

Augmenting Librispeech with French Translations: A Multimodal Corpus for Direct Speech Translation Evaluation

Recent works in spoken language translation (SLT) have attempted to buil...
research
06/13/2021

GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio

This paper introduces GigaSpeech, an evolving, multi-domain English spee...
research
03/04/2022

EAG: Extract and Generate Multi-way Aligned Corpus for Complete Multi-lingual Neural Machine Translation

Complete Multi-lingual Neural Machine Translation (C-MNMT) achieves supe...
research
06/18/2021

Synchronising speech segments with musical beats in Mandarin and English singing

Generating synthesised singing voice with models trained on speech data ...

Please sign up or login with your details

Forgot password? Click here to reset