Direct Simultaneous Speech-to-Text Translation Assisted by Synchronized Streaming ASR

06/11/2021
by   Junkun Chen, et al.
8

Simultaneous speech-to-text translation is widely useful in many scenarios. The conventional cascaded approach uses a pipeline of streaming ASR followed by simultaneous MT, but suffers from error propagation and extra latency. To alleviate these issues, recent efforts attempt to directly translate the source speech into target text simultaneously, but this is much harder due to the combination of two separate tasks. We instead propose a new paradigm with the advantages of both cascaded and end-to-end approaches. The key idea is to use two separate, but synchronized, decoders on streaming ASR and direct speech-to-text translation (ST), respectively, and the intermediate results of ASR guide the decoding policy of (but is not fed as input to) ST. During training time, we use multitask learning to jointly learn these two tasks with a shared encoder. En-to-De and En-to-Es experiments on the MuSTC dataset demonstrate that our proposed technique achieves substantially better translation quality at similar levels of latency.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/11/2022

Large-Scale Streaming End-to-End Speech Translation with Neural Transducers

Neural transducers have been widely used in automatic speech recognition...
research
05/24/2020

ON-TRAC Consortium for End-to-End and Simultaneous Speech Translation Challenge Tasks at IWSLT 2020

This paper describes the ON-TRAC Consortium translation systems develope...
research
11/24/2020

Tight Integrated End-to-End Training for Cascaded Speech Translation

A cascaded speech translation model relies on discrete and non-different...
research
10/15/2021

Direct simultaneous speech to speech translation

We present the first direct simultaneous speech-to-speech translation (S...
research
09/15/2021

UniST: Unified End-to-end Model for Streaming and Non-streaming Speech Translation

This paper presents a unified end-to-end frame-work for both streaming a...
research
10/18/2022

Simultaneous Translation for Unsegmented Input: A Sliding Window Approach

In the cascaded approach to spoken language translation (SLT), the ASR o...
research
06/05/2020

ELITR Non-Native Speech Translation at IWSLT 2020

This paper is an ELITR system submission for the non-native speech trans...

Please sign up or login with your details

Forgot password? Click here to reset