DeepAI
Log In Sign Up

SDST: Successive Decoding for Speech-to-text Translation

09/21/2020
by   Qianqian Dong, et al.
0

End-to-end speech-to-text translation (ST), which directly translates the source language speech to the target language text, has attracted intensive attention recently. However, the combination of speech recognition and machine translation in a single model poses a heavy burden on the direct cross-modal cross-lingual mapping. To reduce the learning difficulty, we propose SDST, an integral framework with Successive Decoding for end-to-end Speech-to-text Translation task. This method is verified in two mainstream datasets. Experiments show that our proposed improves the previous state-of-the-art methods by big margins.

READ FULL TEXT
05/05/2022

Cross-modal Contrastive Learning for Speech Translation

How can we learn unified representations for spoken utterances and their...
02/12/2018

End-to-End Automatic Speech Translation of Audiobooks

We investigate end-to-end speech-to-text translation on a corpus of audi...
04/21/2021

End-to-end Speech Translation via Cross-modal Progressive Training

End-to-end speech translation models have become a new trend in the rese...
02/10/2021

Fused Acoustic and Text Encoding for Multimodal Bilingual Pretraining and Speech Translation

Recently text and speech representation learning has successfully improv...
03/20/2022

STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation

How to learn a better speech representation for end-to-end speech-to-tex...
06/14/2020

UWSpeech: Speech to Speech Translation for Unwritten Languages

Existing speech to speech translation systems heavily rely on the text o...
06/01/2020

Is 42 the Answer to Everything in Subtitling-oriented Speech Translation?

Subtitling is becoming increasingly important for disseminating informat...