Efficient yet Competitive Speech Translation: FBK@IWSLT2022

05/05/2022
by   Marco Gaido, et al.
22

The primary goal of this FBK's systems submission to the IWSLT 2022 offline and simultaneous speech translation tasks is to reduce model training costs without sacrificing translation quality. As such, we first question the need of ASR pre-training, showing that it is not essential to achieve competitive results. Second, we focus on data filtering, showing that a simple method that looks at the ratio between source and target characters yields a quality improvement of 1 BLEU. Third, we compare different methods to reduce the detrimental effect of the audio segmentation mismatch between training data manually segmented at sentence level and inference data that is automatically segmented. Towards the same goal of training cost reduction, we participate in the simultaneous task with the same model trained for offline ST. The effectiveness of our lightweight training strategy is shown by the high score obtained on the MuST-C en-de corpus (26.7 BLEU) and is confirmed in high-resource data conditions by a 1.6 BLEU improvement on the IWSLT2020 test set over last year's winning system.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/08/2022

Does Simultaneous Speech Translation need Simultaneous Models?

In simultaneous speech translation (SimulST), finding the best trade-off...
research
07/01/2021

ESPnet-ST IWSLT 2021 Offline Speech Translation System

This paper describes the ESPnet-ST group's IWSLT 2021 submission in the ...
research
06/23/2021

Dealing with training and test segmentation mismatch: FBK@IWSLT2021

This paper describes FBK's system submission to the IWSLT 2021 Offline S...
research
06/13/2023

NAVER LABS Europe's Multilingual Speech Translation Systems for the IWSLT 2023 Low-Resource Track

This paper presents NAVER LABS Europe's systems for Tamasheq-French and ...
research
08/05/2020

Contextualized Translation of Automatically Segmented Speech

Direct speech-to-text translation (ST) models are usually trained on cor...
research
10/11/2021

It is Not as Good as You Think! Evaluating Simultaneous Machine Translation on Interpretation Data

Most existing simultaneous machine translation (SiMT) systems are traine...
research
09/21/2022

Dodging the Data Bottleneck: Automatic Subtitling with Automatically Segmented ST Corpora

Speech translation for subtitling (SubST) is the task of automatically t...

Please sign up or login with your details

Forgot password? Click here to reset