Consistent Transcription and Translation of Speech

07/24/2020
by   Matthias Sperber, et al.
0

The conventional paradigm in speech translation starts with a speech recognition step to generate transcripts, followed by a translation step with the automatic transcripts as input. To address various shortcomings of this paradigm, recent work explores end-to-end trainable direct models that translate without transcribing. However, transcripts can be an indispensable output in practical applications, which often display transcripts alongside the translations to users. We make this common requirement explicit and explore the task of jointly transcribing and translating speech. While high accuracy of transcript and translation are crucial, even highly accurate systems can suffer from inconsistencies between both outputs that degrade the user experience. We introduce a methodology to evaluate consistency and compare several modeling approaches, including the traditional cascaded approach and end-to-end models. We find that direct models are poorly suited to the joint transcription/translation task, but that end-to-end models that feature a coupled inference procedure are able to achieve strong consistency. We further introduce simple techniques for directly optimizing for consistency, and analyze the resulting trade-offs between consistency, transcription accuracy, and translation accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/22/2021

Streaming Models for Joint Speech Recognition and Translation

Using end-to-end models for speech translation (ST) has increasingly bee...
research
04/14/2020

Speech Translation and the End-to-End Promise: Taking Stock of Where We Are

Over its three decade history, speech translation has experienced severa...
research
06/03/2019

Fluent Translations from Disfluent Speech in End-to-End Speech Translation

Spoken language translation applications for speech suffer due to conver...
research
11/24/2020

Tight Integrated End-to-End Training for Cascaded Speech Translation

A cascaded speech translation model relies on discrete and non-different...
research
11/09/2022

Efficient Speech Translation with Pre-trained Models

When building state-of-the-art speech translation models, the need for l...
research
08/07/2023

End-to-End Evaluation for Low-Latency Simultaneous Speech Translation

The challenge of low-latency speech translation has recently draw signif...
research
10/24/2022

Does Joint Training Really Help Cascaded Speech Translation?

Currently, in speech translation, the straightforward approach - cascadi...

Please sign up or login with your details

Forgot password? Click here to reset