ON-TRAC Consortium Systems for the IWSLT 2022 Dialect and Low-resource Speech Translation Tasks

05/04/2022
by   Marcely Zanon Boito, et al.
2

This paper describes the ON-TRAC Consortium translation systems developed for two challenge tracks featured in the Evaluation Campaign of IWSLT 2022: low-resource and dialect speech translation. For the Tunisian Arabic-English dataset (low-resource and dialect tracks), we build an end-to-end model as our joint primary submission, and compare it against cascaded models that leverage a large fine-tuned wav2vec 2.0 model for ASR. Our results show that in our settings pipeline approaches are still very competitive, and that with the use of transfer learning, they can outperform end-to-end models for speech translation (ST). For the Tamasheq-French dataset (low-resource track) our primary submission leverages intermediate representations from a wav2vec 2.0 model trained on 234 hours of Tamasheq audio, while our contrastive model uses a French phonetic transcription of the Tamasheq audio as input in a Conformer speech translation architecture jointly trained on automatic speech recognition, ST and machine translation losses. Our results highlight that self-supervised models trained on smaller sets of target data are more effective to low-resource end-to-end ST fine-tuning, compared to large off-the-shelf models. Results also illustrate that even approximate phonetic transcriptions can improve ST scores.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/30/2021

IMS' Systems for the IWSLT 2021 Low-Resource Speech Translation Task

This paper describes the submission to the IWSLT 2021 Low-Resource Speec...
research
06/22/2020

Self-Supervised Representations Improve End-to-End Speech Translation

End-to-end speech-to-text translation can provide a simpler and smaller ...
research
12/07/2022

Low-Resource End-to-end Sanskrit TTS using Tacotron2, WaveGlow and Transfer Learning

End-to-end text-to-speech (TTS) systems have been developed for European...
research
06/22/2019

End-to-End ASR for Code-switched Hindi-English Speech

End-to-end (E2E) models have been explored for large speech corpora and ...
research
03/10/2023

An End-to-End Neural Network for Image-to-Audio Transformation

This paper describes an end-to-end (E2E) neural architecture for the aud...
research
03/24/2018

Low-Resource Speech-to-Text Translation

Speech-to-text translation has many potential applications for low-resou...
research
04/05/2022

Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation

Self-Supervised Learning (SSL) models have been successfully applied in ...

Please sign up or login with your details

Forgot password? Click here to reset