Speech Translation with Foundation Models and Optimal Transport: UPC at IWSLT23

06/02/2023
by   Ioannis Tsiamas, et al.
0

This paper describes the submission of the UPC Machine Translation group to the IWSLT 2023 Offline Speech Translation task. Our Speech Translation systems utilize foundation models for speech (wav2vec 2.0) and text (mBART50). We incorporate a Siamese pretraining step of the speech and text encoders with CTC and Optimal Transport, to adapt the speech representations to the space of the text model, thus maximizing transfer learning from MT. After this pretraining, we fine-tune our system end-to-end on ST, with Cross Entropy and Knowledge Distillation. Apart from the available ST corpora, we create synthetic data with SegAugment to better adapt our models to the custom segmentations of the IWSLT test sets. Our best single model obtains 31.2 BLEU points on MuST-C tst-COMMON, 29.8 points on IWLST.tst2020 and 33.4 points on the newly released IWSLT.ACLdev2023.

READ FULL TEXT

page 2

page 6

research
06/04/2020

End-to-End Speech-Translation with Knowledge Distillation: FBK@IWSLT2020

This paper describes FBK's participation in the IWSLT 2020 offline speec...
research
05/24/2023

CMOT: Cross-modal Mixup via Optimal Transport for Speech Translation

End-to-end speech translation (ST) is the task of translating speech sig...
research
05/16/2021

The Volctrans Neural Speech Translation System for IWSLT 2021

This paper describes the systems submitted to IWSLT 2021 by the Volctran...
research
05/10/2021

UPC's Speech Translation System for IWSLT 2021

This paper describes the submission to the IWSLT 2021 offline speech tra...
research
05/07/2021

Learning Shared Semantic Space for Speech-to-Text Translation

Having numerous potential applications and great impact, end-to-end spee...
research
06/23/2021

Dealing with training and test segmentation mismatch: FBK@IWSLT2021

This paper describes FBK's system submission to the IWSLT 2021 Offline S...
research
01/27/2023

Pre-training for Speech Translation: CTC Meets Optimal Transport

The gap between speech and text modalities is a major challenge in speec...

Please sign up or login with your details

Forgot password? Click here to reset