Tight Integrated End-to-End Training for Cascaded Speech Translation

11/24/2020
by   Parnia Bahar, et al.
0

A cascaded speech translation model relies on discrete and non-differentiable transcription, which provides a supervision signal from the source side and helps the transformation between source speech and target text. Such modeling suffers from error propagation between ASR and MT models. Direct speech translation is an alternative method to avoid error propagation; however, its performance is often behind the cascade system. To use an intermediate representation and preserve the end-to-end trainability, previous studies have proposed using two-stage models by passing the hidden vectors of the recognizer into the decoder of the MT model and ignoring the MT encoder. This work explores the feasibility of collapsing the entire cascade components into a single end-to-end trainable model by optimizing all parameters of ASR and MT models jointly without ignoring any learned parameters. It is a tightly integrated method that passes renormalized source word posterior distributions as a soft decision instead of one-hot vectors and enables backpropagation. Therefore, it provides both transcriptions and translations and achieves strong consistency between them. Our experiments on four tasks with different data scenarios show that the model outperforms cascade models up to 1.8 2.0

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/27/2020

Phone Features Improve Speech Translation

End-to-end models for speech translation (ST) more tightly couple speech...
research
04/15/2019

Attention-Passing Models for Robust and Data-Efficient End-to-End Speech Translation

Speech translation has traditionally been approached through cascaded mo...
research
05/09/2023

E2TIMT: Efficient and Effective Modal Adapter for Text Image Machine Translation

Text image machine translation (TIMT) aims to translate texts embedded i...
research
06/11/2021

Direct Simultaneous Speech-to-Text Translation Assisted by Synchronized Streaming ASR

Simultaneous speech-to-text translation is widely useful in many scenari...
research
07/24/2020

Consistent Transcription and Translation of Speech

The conventional paradigm in speech translation starts with a speech rec...
research
10/22/2020

A Technical Report: BUT Speech Translation Systems

The paper describes the BUT's speech translation systems. The systems ar...
research
05/12/2021

Stacked Acoustic-and-Textual Encoding: Integrating the Pre-trained Models into Speech Translation Encoders

Encoder pre-training is promising in end-to-end Speech Translation (ST),...

Please sign up or login with your details

Forgot password? Click here to reset