M3ST: Mix at Three Levels for Speech Translation

12/07/2022
by   Xuxin Cheng, et al.
0

How to solve the data scarcity problem for end-to-end speech-to-text translation (ST)? It's well known that data augmentation is an efficient method to improve performance for many tasks by enlarging the dataset. In this paper, we propose Mix at three levels for Speech Translation (M^3ST) method to increase the diversity of the augmented training corpus. Specifically, we conduct two phases of fine-tuning based on a pre-trained model using external machine translation (MT) data. In the first stage of fine-tuning, we mix the training corpus at three levels, including word level, sentence level and frame level, and fine-tune the entire model with mixed data. At the second stage of fine-tuning, we take both original speech sequences and original text sequences in parallel into the model to fine-tune the network, and use Jensen-Shannon divergence to regularize their outputs. Experiments on MuST-C speech translation benchmark and analysis show that M^3ST outperforms current strong baselines and achieves state-of-the-art results on eight directions with an average BLEU of 29.9.

READ FULL TEXT
research
04/21/2021

End-to-end Speech Translation via Cross-modal Progressive Training

End-to-end speech translation models have become a new trend in the rese...
research
04/30/2020

Recipes for Adapting Pre-trained Monolingual and Multilingual Models to Machine Translation

There has been recent success in pre-training on monolingual data and fi...
research
09/17/2019

Bridging the Gap between Pre-Training and Fine-Tuning for End-to-End Speech Translation

End-to-end speech translation, a hot topic in recent years, aims to tran...
research
10/15/2020

Pronoun-Targeted Fine-tuning for NMT with Hybrid Losses

Popular Neural Machine Translation model training uses strategies like b...
research
05/23/2022

Non-Parametric Domain Adaptation for End-to-End Speech Translation

End-to-End Speech Translation (E2E-ST) has received increasing attention...
research
07/04/2023

Prompt Tuning Pushes Farther, Contrastive Learning Pulls Closer: A Two-Stage Approach to Mitigate Social Biases

As the representation capability of Pre-trained Language Models (PLMs) i...
research
09/06/2023

HC3 Plus: A Semantic-Invariant Human ChatGPT Comparison Corpus

ChatGPT has gained significant interest due to its impressive performanc...

Please sign up or login with your details

Forgot password? Click here to reset