Long-Form End-to-End Speech Translation via Latent Alignment Segmentation

09/20/2023
by   Peter Polák, et al.
0

Current simultaneous speech translation models can process audio only up to a few seconds long. Contemporary datasets provide an oracle segmentation into sentences based on human-annotated transcripts and translations. However, the segmentation into sentences is not available in the real world. Current speech segmentation approaches either offer poor segmentation quality or have to trade latency for quality. In this paper, we propose a novel segmentation approach for a low-latency end-to-end speech translation. We leverage the existing speech translation encoder-decoder architecture with ST CTC and show that it can perform the segmentation task without supervision or additional parameters. To the best of our knowledge, our method is the first that allows an actual end-to-end simultaneous speech translation, as the same model is used for translation and segmentation at the same time. On a diverse set of language pairs and in- and out-of-domain data, we show that the proposed approach achieves state-of-the-art quality at no additional computational cost.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/07/2023

End-to-End Evaluation for Low-Latency Simultaneous Speech Translation

The challenge of low-latency speech translation has recently draw signif...
research
05/25/2023

End-to-End Simultaneous Speech Translation with Differentiable Segmentation

End-to-end simultaneous speech translation (SimulST) outputs translation...
research
10/20/2020

Fluent and Low-latency Simultaneous Speech-to-Speech Translation with Self-adaptive Training

Simultaneous speech-to-speech translation is widely useful but extremely...
research
10/24/2022

Don't Discard Fixed-Window Audio Segmentation in Speech-to-Text Translation

For real-life applications, it is crucial that end-to-end spoken languag...
research
11/03/2020

SimulMT to SimulST: Adapting Simultaneous Text Translation to End-to-End Simultaneous Speech Translation

Simultaneous text translation and end-to-end speech translation have rec...
research
04/29/2021

Impact of Encoding and Segmentation Strategies on End-to-End Simultaneous Speech Translation

Boosted by the simultaneous translation shared task at IWSLT 2020, promi...
research
09/20/2023

Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff

Blockwise self-attentional encoder models have recently emerged as one p...

Please sign up or login with your details

Forgot password? Click here to reset