Learning Shared Semantic Space for Speech-to-Text Translation

05/07/2021
by   Chi Han, et al.
0

Having numerous potential applications and great impact, end-to-end speech translation (ST) has long been treated as an independent task, failing to fully draw strength from the rapid advances of its sibling - text machine translation (MT). With text and audio inputs represented differently, the modality gap has rendered MT data and its end-to-end models incompatible with their ST counterparts. In observation of this obstacle, we propose to bridge this representation gap with Chimera. By projecting audio and text features to a common semantic representation, Chimera unifies MT and ST tasks and boosts the performance on ST benchmark, MuST-C, to a new state-of-the-art. Specifically, Chimera obtains 26.3 BLEU on EN-DE, improving the SOTA by a +2.7 BLEU margin. Further experimental analyses demonstrate that the shared semantic space indeed conveys common knowledge between these two tasks and thus paves a new way for augmenting training resources across modalities.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/23/2023

Improving speech translation by fusing speech and text

In speech translation, leveraging multimodal data to improve model perfo...
research
10/28/2020

Bridging the Modality Gap for Speech-to-Text Translation

End-to-end speech translation aims to translate speech in one language i...
research
05/19/2023

DUB: Discrete Unit Back-translation for Speech Translation

How can speech-to-text translation (ST) perform as well as machine trans...
research
05/15/2023

Understanding and Bridging the Modality Gap for Speech Translation

How to achieve better end-to-end speech translation (ST) by leveraging (...
research
10/15/2022

Generating Synthetic Speech from SpokenVocab for Speech Translation

Training end-to-end speech translation (ST) systems requires sufficientl...
research
06/02/2023

Speech Translation with Foundation Models and Optimal Transport: UPC at IWSLT23

This paper describes the submission of the UPC Machine Translation group...
research
11/14/2016

Zero-resource Machine Translation by Multimodal Encoder-decoder Network with Multimedia Pivot

We propose an approach to build a neural machine translation system with...

Please sign up or login with your details

Forgot password? Click here to reset