Improving End-to-End Speech Translation by Imitation-Based Knowledge Distillation with Synthetic Transcripts

07/17/2023
by   Rebekka Hubert, et al.
0

End-to-end automatic speech translation (AST) relies on data that combines audio inputs with text translation outputs. Previous work used existing large parallel corpora of transcriptions and translations in a knowledge distillation (KD) setup to distill a neural machine translation (NMT) into an AST student model. While KD allows using larger pretrained models, the reliance of previous KD approaches on manual audio transcripts in the data pipeline restricts the applicability of this framework to AST. We present an imitation learning approach where a teacher NMT system corrects the errors of an AST student without relying on manual transcripts. We show that the NMT teacher can recover from errors in automatic transcriptions and is able to correct erroneous translations of the AST student, leading to improvements of about 4 BLEU points over the standard AST end-to-end baseline on the English-German CoVoST-2 and MuST-C datasets, respectively. Code and data are publicly available.[<https://github.com/HubReb/imitkd_ast/releases/tag/v1.1>]

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/06/2017

Ensemble Distillation for Neural Machine Translation

Knowledge distillation describes a method for training a student network...
research
04/17/2019

End-to-End Speech Translation with Knowledge Distillation

End-to-end speech translation (ST), which directly translates from sourc...
research
03/29/2022

Nix-TTS: An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation

We propose Nix-TTS, a lightweight neural TTS (Text-to-Speech) model achi...
research
04/13/2021

Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation

A conventional approach to improving the performance of end-to-end speec...
research
12/16/2021

Amortized Noisy Channel Neural Machine Translation

Noisy channel models have been especially effective in neural machine tr...
research
06/02/2021

Rejuvenating Low-Frequency Words: Making the Most of Parallel Data in Non-Autoregressive Translation

Knowledge distillation (KD) is commonly used to construct synthetic data...
research
10/10/2022

Distill the Image to Nowhere: Inversion Knowledge Distillation for Multimodal Machine Translation

Past works on multimodal machine translation (MMT) elevate bilingual set...

Please sign up or login with your details

Forgot password? Click here to reset