Zero-shot Speech Translation

07/13/2021
by   Tu Anh Dinh, et al.
0

Speech Translation (ST) is the task of translating speech in one language into text in another language. Traditional cascaded approaches for ST, using Automatic Speech Recognition (ASR) and Machine Translation (MT) systems, are prone to error propagation. End-to-end approaches use only one system to avoid propagating error, yet are difficult to employ due to data scarcity. We explore zero-shot translation, which enables translating a pair of languages that is unseen during training, thus avoid the use of end-to-end ST data. Zero-shot translation has been shown to work for multilingual machine translation, yet has not been studied for speech translation. We attempt to build zero-shot ST models that are trained only on ASR and MT tasks but can do ST task during inference. The challenge is that the representation of text and audio is significantly different, thus the models learn ASR and MT tasks in different ways, making it non-trivial to perform zero-shot. These models tend to output the wrong language when performing zero-shot ST. We tackle the issues by including additional training data and an auxiliary loss function that minimizes the text-audio difference. Our experiment results and analysis show that the methods are promising for zero-shot ST. Moreover, our methods are particularly useful in the few-shot settings where a limited amount of ST data is available, with improvements of up to +11.8 BLEU points compared to direct end-to-end ST models and +3.9 BLEU points compared to ST models fine-tuned from pre-trained ASR model.

READ FULL TEXT
research
01/26/2022

Tackling data scarcity in speech translation using zero-shot multilingual machine translation techniques

Recently, end-to-end speech translation (ST) has gained significant atte...
research
09/19/2023

Harnessing the Zero-Shot Power of Instruction-Tuned Large Language Model in End-to-End Speech Recognition

We present a novel integration of an instruction-tuned large language mo...
research
10/08/2019

One-To-Many Multilingual End-to-end Speech Translation

Nowadays, training end-to-end neural models for spoken language translat...
research
10/03/2022

Hypothesis Engineering for Zero-Shot Hate Speech Detection

Standard approaches to hate speech detection rely on sufficient availabl...
research
02/22/2022

RuCLIP – new models and experiments: a technical report

In the report we propose six new implementations of ruCLIP model trained...
research
04/30/2020

Automatic Machine Translation Evaluation in Many Languages via Zero-Shot Paraphrasing

We propose the use of a sequence-to-sequence paraphraser for automatic m...
research
03/31/2021

Q-ASR: Integer-only Zero-shot Quantization for Efficient Speech Recognition

End-to-end neural network models achieve improved performance on various...

Please sign up or login with your details

Forgot password? Click here to reset