Impact of Encoding and Segmentation Strategies on End-to-End Simultaneous Speech Translation

04/29/2021
by   Ha Nguyen, et al.
0

Boosted by the simultaneous translation shared task at IWSLT 2020, promising end-to-end online speech translation approaches were recently proposed. They consist in incrementally encoding a speech input (in a source language) and decoding the corresponding text (in a target language) with the best possible trade-off between latency and translation quality. This paper investigates two key aspects of end-to-end simultaneous speech translation: (a) how to encode efficiently the continuous speech flow, and (b) how to segment the speech flow in order to alternate optimally between reading (R: encoding input) and writing (W: decoding output) operations. We extend our previously proposed end-to-end online decoding strategy and show that while replacing BLSTM by ULSTM encoding degrades performance in offline mode, it actually improves both efficiency and performance in online mode. We also measure the impact of different methods to segment the speech signal (using fixed interval boundaries, oracle word boundaries or randomly set boundaries) and show that our best end-to-end online decoding strategy is surprisingly the one that alternates R/W operations on fixed size blocks on our English-German speech translation setup.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/04/2021

An Empirical Study of End-to-end Simultaneous Speech Translation Decoding Strategies

This paper proposes a decoding strategy for end-to-end simultaneous spee...
research
11/03/2020

SimulMT to SimulST: Adapting Simultaneous Text Translation to End-to-End Simultaneous Speech Translation

Simultaneous text translation and end-to-end speech translation have rec...
research
02/12/2018

End-to-End Automatic Speech Translation of Audiobooks

We investigate end-to-end speech-to-text translation on a corpus of audi...
research
07/19/2021

Simultaneous Speech Translation for Live Subtitling: from Delay to Display

With the increased audiovisualisation of communication, the need for liv...
research
09/20/2023

Long-Form End-to-End Speech Translation via Latent Alignment Segmentation

Current simultaneous speech translation models can process audio only up...
research
07/30/2019

DuTongChuan: Context-aware Translation Model for Simultaneous Interpreting

In this paper, we present DuTongChuan, a novel context-aware translation...
research
05/25/2023

End-to-End Simultaneous Speech Translation with Differentiable Segmentation

End-to-end simultaneous speech translation (SimulST) outputs translation...

Please sign up or login with your details

Forgot password? Click here to reset