Don't Discard Fixed-Window Audio Segmentation in Speech-to-Text Translation

10/24/2022
by   Chantal Amrhein, et al.
0

For real-life applications, it is crucial that end-to-end spoken language translation models perform well on continuous audio, without relying on human-supplied segmentation. For online spoken language translation, where models need to start translating before the full utterance is spoken, most previous work has ignored the segmentation problem. In this paper, we compare various methods for improving models' robustness towards segmentation errors and different segmentation strategies in both offline and online settings and report results on translation quality, flicker and delay. Our findings on five different language pairs show that a simple fixed-window audio segmentation can perform surprisingly well given the right conditions.

READ FULL TEXT

page 8

page 15

page 16

page 17

research
03/28/2022

Multilingual Simultaneous Speech Translation

Applications designed for simultaneous speech translation during events ...
research
09/20/2023

Long-Form End-to-End Speech Translation via Latent Alignment Segmentation

Current simultaneous speech translation models can process audio only up...
research
02/04/2020

CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus

Spoken language translation has recently witnessed a resurgence in popul...
research
04/10/2023

ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit

ESPnet-ST-v2 is a revamp of the open-source ESPnet-ST toolkit necessitat...
research
04/23/2021

Beyond Voice Activity Detection: Hybrid Audio Segmentation for Direct Speech Translation

The audio segmentation mismatch between training data and those seen at ...
research
11/21/2019

An analysis of observation length requirements for machine understanding of human behaviors in spoken language

Machine learning-based human behavior modeling, often at the level of ch...
research
10/25/2022

Topical Segmentation of Spoken Narratives: A Test Case on Holocaust Survivor Testimonies

The task of topical segmentation is well studied, but previous work has ...

Please sign up or login with your details

Forgot password? Click here to reset