Speech-to-Speech Translation For A Real-world Unwritten Language

11/11/2022
by   Peng-Jen Chen, et al.
0

We study speech-to-speech translation (S2ST) that translates speech from one language into another language and focuses on building systems to support languages without standard text writing systems. We use English-Taiwanese Hokkien as a case study, and present an end-to-end solution from training data collection, modeling choices to benchmark dataset release. First, we present efforts on creating human annotated data, automatically mining data from large unlabeled speech datasets, and adopting pseudo-labeling to produce weakly supervised data. On the modeling, we take advantage of recent advances in applying self-supervised discrete representations as target for prediction in S2ST and show the effectiveness of leveraging additional text supervision from Mandarin, a language similar to Hokkien, in model training. Finally, we release an S2ST benchmark set to facilitate future research in this field. The demo can be found at https://huggingface.co/spaces/facebook/Hokkien_Translation .

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/24/2022

Leveraging unsupervised and weakly-supervised data to improve direct speech-to-speech translation

End-to-end speech-to-speech translation (S2ST) without relying on interm...
research
12/06/2016

Listen and Translate: A Proof of Concept for End-to-End Speech-to-Text Translation

This paper proposes a first attempt to build an end-to-end speech-to-tex...
research
07/12/2021

Direct speech-to-speech translation with discrete units

We present a direct speech-to-speech translation (S2ST) model that trans...
research
11/05/2018

Leveraging Weakly Supervised Data to Improve End-to-End Speech-to-Text Translation

End-to-end Speech Translation (ST) models have many potential advantages...
research
12/15/2021

Textless Speech-to-Speech Translation on Real Data

We present a textless speech-to-speech translation (S2ST) system that ca...
research
07/09/2023

Towards cross-language prosody transfer for dialog

Speech-to-speech translation systems today do not adequately support use...
research
06/17/2021

An Information Retrieval Approach to Building Datasets for Hate Speech Detection

Building a benchmark dataset for hate speech detection presents several ...

Please sign up or login with your details

Forgot password? Click here to reset