Large-Scale Streaming End-to-End Speech Translation with Neural Transducers

04/11/2022
by   Jian Xue, et al.
0

Neural transducers have been widely used in automatic speech recognition (ASR). In this paper, we introduce it to streaming end-to-end speech translation (ST), which aims to convert audio signals to texts in other languages directly. Compared with cascaded ST that performs ASR followed by text-based machine translation (MT), the proposed Transformer transducer (TT)-based ST model drastically reduces inference latency, exploits speech information, and avoids error propagation from ASR to MT. To improve the modeling capacity, we propose attention pooling for the joint network in TT. In addition, we extend TT-based ST to multilingual ST, which generates texts of multiple languages at the same time. Experimental results on a large-scale 50 thousand (K) hours pseudo-labeled training set show that TT-based ST not only significantly reduces inference time but also outperforms non-streaming cascaded ST for English-German translation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/21/2020

Cascaded Models With Cyclic Feedback For Direct Speech Translation

Direct speech translation describes a scenario where only speech inputs ...
research
06/11/2021

Direct Simultaneous Speech-to-Text Translation Assisted by Synchronized Streaming ASR

Simultaneous speech-to-text translation is widely useful in many scenari...
research
09/14/2019

Leveraging Out-of-Task Data for End-to-End Automatic Speech Translation

For automatic speech translation (AST), end-to-end approaches are outper...
research
09/20/2021

MeetDot: Videoconferencing with Live Translation Captions

We present MeetDot, a videoconferencing system with live translation cap...
research
09/27/2021

Integrated Training for Sequence-to-Sequence Models Using Non-Autoregressive Transformer

Complex natural language applications such as speech translation or pivo...
research
01/22/2021

Streaming Models for Joint Speech Recognition and Translation

Using end-to-end models for speech translation (ST) has increasingly bee...
research
10/11/2022

CTC Alignments Improve Autoregressive Translation

Connectionist Temporal Classification (CTC) is a widely used approach fo...

Please sign up or login with your details

Forgot password? Click here to reset