End-to-End Speech Translation with Knowledge Distillation

04/17/2019
by   Yuchen Liu, et al.
0

End-to-end speech translation (ST), which directly translates from source language speech into target language text, has attracted intensive attentions in recent years. Compared to conventional pipeline systems, end-to-end ST models have advantages of lower latency, smaller model size and less error propagation. However, the combination of speech recognition and text translation in one model is more difficult than each of these two tasks. In this paper, we propose a knowledge distillation approach to improve ST model by transferring the knowledge from text translation model. Specifically, we first train a text translation model, regarded as a teacher model, and then ST model is trained to learn output probabilities from teacher model through knowledge distillation. Experiments on English- French Augmented LibriSpeech and English-Chinese TED corpus show that end-to-end ST is possible to implement on both similar and dissimilar language pairs. In addition, with the instruction of teacher model, end-to-end ST model can gain significant improvements by over 3.5 BLEU points.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/09/2023

Multi-Teacher Knowledge Distillation For Text Image Machine Translation

Text image machine translation (TIMT) has been widely used in various re...
research
03/16/2022

Sample, Translate, Recombine: Leveraging Audio Alignments for Data Augmentation in End-to-end Speech Translation

End-to-end speech translation relies on data that pair source-language s...
research
03/29/2022

Nix-TTS: An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation

We propose Nix-TTS, a lightweight neural TTS (Text-to-Speech) model achi...
research
12/16/2019

Synchronous Speech Recognition and Speech-to-Text Translation with Interactive Decoding

Speech-to-text translation (ST), which translates source language speech...
research
11/27/2022

EPIK: Eliminating multi-model Pipelines with Knowledge-distillation

Real-world tasks are largely composed of multiple models, each performin...
research
07/17/2023

Improving End-to-End Speech Translation by Imitation-Based Knowledge Distillation with Synthetic Transcripts

End-to-end automatic speech translation (AST) relies on data that combin...
research
10/04/2021

On the Interplay Between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis

Are end-to-end text-to-speech (TTS) models over-parametrized? To what ex...

Please sign up or login with your details

Forgot password? Click here to reset