An Empirical Study of Consistency Regularization for End-to-End Speech-to-Text Translation

08/28/2023
by   Pengzhi Gao, et al.
0

Consistency regularization methods, such as R-Drop (Liang et al., 2021) and CrossConST (Gao et al., 2023), have achieved impressive supervised and zero-shot performance in the neural machine translation (NMT) field. Can we also boost end-to-end (E2E) speech-to-text translation (ST) by leveraging consistency regularization? In this paper, we conduct empirical studies on intra-modal and cross-modal consistency and propose two training strategies, SimRegCR and SimZeroCR, for E2E ST in regular and zero-shot scenarios. Experiments on the MuST-C benchmark show that our approaches achieve state-of-the-art (SOTA) performance in most translation directions. The analyses prove that regularization brought by the intra-modal consistency, instead of modality gap, is crucial for the regular E2E ST, and the cross-modal consistency could close the modality gap and boost the zero-shot E2E ST performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2022

T-Modules: Translation Modules for Zero-Shot Cross-Modal Machine Translation

We present a new approach to perform zero-shot cross-modal transfer betw...
research
05/12/2023

Improving Zero-shot Multilingual Neural Machine Translation by Leveraging Cross-lingual Consistency Regularization

The multilingual neural machine translation (NMT) model has a promising ...
research
05/22/2023

Zero-Shot End-to-End Spoken Language Understanding via Cross-Modal Selective Self-Training

End-to-end (E2E) spoken language understanding (SLU) is constrained by t...
research
10/13/2022

Low-resource Neural Machine Translation with Cross-modal Alignment

How to achieve neural machine translation with limited parallel data? Ex...
research
05/15/2023

Understanding and Bridging the Modality Gap for Speech Translation

How to achieve better end-to-end speech translation (ST) by leveraging (...
research
05/23/2023

Improving speech translation by fusing speech and text

In speech translation, leveraging multimodal data to improve model perfo...
research
05/16/2023

Exploring the Impact of Layer Normalization for Zero-shot Neural Machine Translation

This paper studies the impact of layer normalization (LayerNorm) on zero...

Please sign up or login with your details

Forgot password? Click here to reset