Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff

09/20/2023
by   Peter Polák, et al.
0

Blockwise self-attentional encoder models have recently emerged as one promising end-to-end approach to simultaneous speech translation. These models employ a blockwise beam search with hypothesis reliability scoring to determine when to wait for more input speech before translating further. However, this method maintains multiple hypotheses until the entire speech input is consumed – this scheme cannot directly show a single incremental translation to users. Further, this method lacks mechanisms for controlling the quality vs. latency tradeoff. We propose a modified incremental blockwise beam search incorporating local agreement or hold-n policies for quality-latency control. We apply our framework to models trained for online or offline translation and demonstrate that both types can be effectively used in online mode. Experimental results on MuST-C show 0.6-3.6 BLEU improvement without changing latency or 0.8-1.4 s latency improvement without changing quality.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2020

ON-TRAC Consortium for End-to-End and Simultaneous Speech Translation Challenge Tasks at IWSLT 2020

This paper describes the ON-TRAC Consortium translation systems develope...
research
10/15/2021

Incremental Speech Synthesis For Speech-To-Speech Translation

In a speech-to-speech translation (S2ST) pipeline, the text-to-speech (T...
research
11/03/2020

SimulMT to SimulST: Adapting Simultaneous Text Translation to End-to-End Simultaneous Speech Translation

Simultaneous text translation and end-to-end speech translation have rec...
research
04/08/2022

Does Simultaneous Speech Translation need Simultaneous Models?

In simultaneous speech translation (SimulST), finding the best trade-off...
research
03/28/2022

Multilingual Simultaneous Speech Translation

Applications designed for simultaneous speech translation during events ...
research
09/20/2023

Long-Form End-to-End Speech Translation via Latent Alignment Segmentation

Current simultaneous speech translation models can process audio only up...
research
11/07/2019

Incremental Text-to-Speech Synthesis with Prefix-to-Prefix Framework

Text-to-speech synthesis (TTS) has witnessed rapid progress in recent ye...

Please sign up or login with your details

Forgot password? Click here to reset