ESPnet-ST IWSLT 2021 Offline Speech Translation System

07/01/2021
by   Hirofumi Inaguma, et al.
0

This paper describes the ESPnet-ST group's IWSLT 2021 submission in the offline speech translation track. This year we made various efforts on training data, architecture, and audio segmentation. On the data side, we investigated sequence-level knowledge distillation (SeqKD) for end-to-end (E2E) speech translation. Specifically, we used multi-referenced SeqKD from multiple teachers trained on different amounts of bitext. On the architecture side, we adopted the Conformer encoder and the Multi-Decoder architecture, which equips dedicated decoders for speech recognition and translation tasks in a unified encoder-decoder model and enables search in both source and target language spaces during inference. We also significantly improved audio segmentation by using the pyannote.audio toolkit and merging multiple short segments for long context modeling. Experimental evaluations showed that each of them contributed to large improvements in translation performance. Our best E2E system combined all the above techniques with model ensembling and achieved 31.4 BLEU on the 2-ref of tst2021 and 21.2 BLEU and 19.3 BLEU on the two single references of tst2021.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/12/2022

The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task

This paper describes the submission of our end-to-end YiTrans speech tra...
research
10/30/2019

ON-TRAC Consortium End-to-End Speech Translation Systems for the IWSLT 2019 Shared Task

This paper describes the ON-TRAC Consortium translation systems develope...
research
05/05/2022

Efficient yet Competitive Speech Translation: FBK@IWSLT2022

The primary goal of this FBK's systems submission to the IWSLT 2022 offl...
research
12/19/2022

SegAugment: Maximizing the Utility of Speech Translation Data with Segmentation-based Augmentations

Data scarcity is one of the main issues with the end-to-end approach for...
research
06/04/2020

End-to-End Speech-Translation with Knowledge Distillation: FBK@IWSLT2020

This paper describes FBK's participation in the IWSLT 2020 offline speec...
research
05/19/2023

AlignAtt: Using Attention-based Audio-Translation Alignments as a Guide for Simultaneous Speech Translation

Attention is the core mechanism of today's most used architectures for n...
research
05/14/2018

RETURNN as a Generic Flexible Neural Toolkit with Application to Translation and Speech Recognition

We compare the fast training and decoding speed of RETURNN of attention ...

Please sign up or login with your details

Forgot password? Click here to reset