Streaming Models for Joint Speech Recognition and Translation

01/22/2021
by   Orion Weller, et al.
0

Using end-to-end models for speech translation (ST) has increasingly been the focus of the ST community. These models condense the previously cascaded systems by directly converting sound waves into translated text. However, cascaded models have the advantage of including automatic speech recognition output, useful for a variety of practical ST systems that often display transcripts to the user alongside the translations. To bridge this gap, recent work has shown initial progress into the feasibility for end-to-end models to produce both of these outputs. However, all previous work has only looked at this problem from the consecutive perspective, leaving uncertainty on whether these approaches are effective in the more challenging streaming setting. We develop an end-to-end streaming ST model based on a re-translation approach and compare against standard cascading approaches. We also introduce a novel inference method for the joint case, interleaving both transcript and translation in generation and removing the need to use separate decoders. Our evaluation across a range of metrics capturing accuracy, latency, and consistency shows that our end-to-end models are statistically similar to cascading models, while having half the number of parameters. We also find that both systems provide strong translation quality at low latency, keeping 99 consecutive quality at a lag of just under a second.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/24/2020

Consistent Transcription and Translation of Speech

The conventional paradigm in speech translation starts with a speech rec...
research
08/07/2023

End-to-End Evaluation for Low-Latency Simultaneous Speech Translation

The challenge of low-latency speech translation has recently draw signif...
research
09/15/2021

UniST: Unified End-to-end Model for Streaming and Non-streaming Speech Translation

This paper presents a unified end-to-end frame-work for both streaming a...
research
04/11/2022

Large-Scale Streaming End-to-End Speech Translation with Neural Transducers

Neural transducers have been widely used in automatic speech recognition...
research
10/24/2022

Does Joint Training Really Help Cascaded Speech Translation?

Currently, in speech translation, the straightforward approach - cascadi...
research
04/14/2020

Speech Translation and the End-to-End Promise: Taking Stock of Where We Are

Over its three decade history, speech translation has experienced severa...
research
05/11/2017

Reducing Bias in Production Speech Models

Replacing hand-engineered pipelines with end-to-end deep learning system...

Please sign up or login with your details

Forgot password? Click here to reset