Contextualized Translation of Automatically Segmented Speech

08/05/2020
by   Marco Gaido, et al.
0

Direct speech-to-text translation (ST) models are usually trained on corpora segmented at sentence level, but at inference time they are commonly fed with audio split by a voice activity detector (VAD). Since VAD segmentation is not syntax-informed, the resulting segments do not necessarily correspond to well-formed sentences uttered by the speaker but, most likely, to fragments of one or more sentences. This segmentation mismatch degrades considerably the quality of ST models' output. So far, researchers have focused on improving audio segmentation towards producing sentence-like splits. In this paper, instead, we address the issue in the model, making it more robust to a different, potentially sub-optimal segmentation. To this aim, we train our models on randomly segmented data and compare two approaches: fine-tuning and adding the previous segment as context. We show that our context-aware solution is more robust to VAD-segmented input, outperforming a strong base model and the fine-tuning on different VAD segmentations of an English-German test set by up to 4.25 BLEU points.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/23/2021

Dealing with training and test segmentation mismatch: FBK@IWSLT2021

This paper describes FBK's system submission to the IWSLT 2021 Offline S...
research
04/23/2021

Beyond Voice Activity Detection: Hybrid Audio Segmentation for Direct Speech Translation

The audio segmentation mismatch between training data and those seen at ...
research
03/29/2022

Speech Segmentation Optimization using Segmented Bilingual Speech Corpus for End-to-end Speech Translation

Speech segmentation, which splits long speech into short segments, is es...
research
05/05/2022

Efficient yet Competitive Speech Translation: FBK@IWSLT2022

The primary goal of this FBK's systems submission to the IWSLT 2022 offl...
research
02/09/2022

SHAS: Approaching optimal Segmentation for End-to-End Speech Translation

Speech translation models are unable to directly process long audios, li...
research
05/10/2021

UPC's Speech Translation System for IWSLT 2021

This paper describes the submission to the IWSLT 2021 offline speech tra...
research
12/19/2022

SegAugment: Maximizing the Utility of Speech Translation Data with Segmentation-based Augmentations

Data scarcity is one of the main issues with the end-to-end approach for...

Please sign up or login with your details

Forgot password? Click here to reset