Phone Features Improve Speech Translation

05/27/2020
by   Elizabeth Salesky, et al.
0

End-to-end models for speech translation (ST) more tightly couple speech recognition (ASR) and machine translation (MT) than a traditional cascade of separate ASR and MT models, with simpler model architectures and the potential for reduced error propagation. Their performance is often assumed to be superior, though in many conditions this is not yet the case. We compare cascaded and end-to-end models across high, medium, and low-resource conditions, and show that cascades remain stronger baselines. Further, we introduce two methods to incorporate phone features into ST models. We show that these features improve both architectures, closing the gap between end-to-end models and cascades, and outperforming previous academic work – by up to 9 BLEU on our low-resource setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/30/2021

IMS' Systems for the IWSLT 2021 Low-Resource Speech Translation Task

This paper describes the submission to the IWSLT 2021 Low-Resource Speec...
research
11/24/2020

Tight Integrated End-to-End Training for Cascaded Speech Translation

A cascaded speech translation model relies on discrete and non-different...
research
05/01/2021

AlloST: Low-resource Speech Translation without Source Transcription

The end-to-end architecture has made promising progress in speech transl...
research
06/22/2019

End-to-End ASR for Code-switched Hindi-English Speech

End-to-end (E2E) models have been explored for large speech corpora and ...
research
10/19/2020

Subtitles to Segmentation: Improving Low-Resource Speech-to-Text Translation Pipelines

In this work, we focus on improving ASR output segmentation in the conte...
research
04/22/2020

Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner Party Transcription

While end-to-end ASR systems have proven competitive with the convention...
research
06/04/2019

Exploring Phoneme-Level Speech Representations for End-to-End Speech Translation

Previous work on end-to-end translation from speech has primarily used f...

Please sign up or login with your details

Forgot password? Click here to reset