Efficient Speech Translation with Pre-trained Models

11/09/2022
by   Zhaolin Li, et al.
0

When building state-of-the-art speech translation models, the need for large computational resources is a significant obstacle due to the large training data size and complex models. The availability of pre-trained models is a promising opportunity to build strong speech translation systems efficiently. In a first step, we investigate efficient strategies to build cascaded and end-to-end speech translation systems based on pre-trained models. Using this strategy, we can train and apply the models on a single GPU. While the end-to-end models show superior translation performance to cascaded ones, the application of this technology has a limitation on the need for additional end-to-end training data. In a second step, we proposed an additional similarity loss to encourage the model to generate similar hidden representations for speech and transcript. Using this technique, we can increase the data efficiency and improve the translation quality by 6 BLEU points in scenarios with limited end-to-end training data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/20/2019

A Comparative Study on End-to-end Speech to Text Translation

Recent advances in deep learning show that end-to-end speech to text tra...
research
09/27/2021

Integrated Training for Sequence-to-Sequence Models Using Non-Autoregressive Transformer

Complex natural language applications such as speech translation or pivo...
research
05/26/2023

Inter-connection: Effective Connection between Pre-trained Encoder and Decoder for Speech Translation

In end-to-end speech translation, speech and text pre-trained models imp...
research
06/08/2023

KIT's Multilingual Speech Translation System for IWSLT 2023

Many existing speech translation benchmarks focus on native-English spee...
research
03/08/2022

End-to-end Multiple Instance Learning with Gradient Accumulation

Being able to learn on weakly labeled data, and provide interpretability...
research
07/24/2020

Consistent Transcription and Translation of Speech

The conventional paradigm in speech translation starts with a speech rec...
research
09/11/2021

COMBO: State-of-the-Art Morphosyntactic Analysis

We introduce COMBO - a fully neural NLP system for accurate part-of-spee...

Please sign up or login with your details

Forgot password? Click here to reset