Improving Speech Translation by Understanding and Learning from the Auxiliary Text Translation Task

07/12/2021
by   Yun Tang, et al.
12

Pretraining and multitask learning are widely used to improve the speech to text translation performance. In this study, we are interested in training a speech to text translation model along with an auxiliary text to text translation task. We conduct a detailed analysis to understand the impact of the auxiliary task on the primary task within the multitask learning framework. Our analysis confirms that multitask learning tends to generate similar decoder representations from different modalities and preserve more information from the pretrained text translation modules. We observe minimal negative transfer effect between the two tasks and sharing more parameters is helpful to transfer knowledge from the text task to the speech task. The analysis also reveals that the modality representation difference at the top decoder layers is still not negligible, and those layers are critical for the translation quality. Inspired by these findings, we propose three methods to improve translation quality. First, a parameter sharing and initialization strategy is proposed to enhance information sharing between the tasks. Second, a novel attention-based regularization is proposed for the encoders and pulls the representations from different modalities closer. Third, an online knowledge distillation is proposed to enhance the knowledge transfer from the text to the speech task. Our experiments show that the proposed approach improves translation performance by more than 2 BLEU over a strong baseline and achieves state-of-the-art results on the MuST-C English-German, English-French and English-Spanish language pairs.

READ FULL TEXT
research
10/24/2020

Cross-Modal Transfer Learning for Multilingual Speech-to-Text Translation

We propose an effective approach to utilize pretrained speech and text m...
research
10/21/2020

A General Multi-Task Learning Framework to Leverage Text Data for Speech to Text Tasks

Attention-based sequence-to-sequence modeling provides a powerful and el...
research
05/11/2017

Imagination improves Multimodal Translation

We decompose multimodal translation into two sub-tasks: learning to tran...
research
02/19/2018

Tied Multitask Learning for Neural Speech Translation

We explore multitask models for neural translation of speech, augmenting...
research
05/21/2020

Worse WER, but Better BLEU? Leveraging Word Embedding as Intermediate in Multitask End-to-End Speech Translation

Speech translation (ST) aims to learn transformations from speech in the...
research
11/11/2019

Data Efficient Direct Speech-to-Text Translation with Modality Agnostic Meta-Learning

End-to-end Speech Translation (ST) models have several advantages such a...
research
02/03/2023

Egocentric Video Task Translation @ Ego4D Challenge 2022

This technical report describes the EgoTask Translation approach that ex...

Please sign up or login with your details

Forgot password? Click here to reset