UniST: Unified End-to-end Model for Streaming and Non-streaming Speech Translation

09/15/2021
by   Qianqian Dong, et al.
0

This paper presents a unified end-to-end frame-work for both streaming and non-streamingspeech translation. While the training recipes for non-streaming speech translation have been mature, the recipes for streaming speechtranslation are yet to be built. In this work, wefocus on developing a unified model (UniST) which supports streaming and non-streaming ST from the perspective of fundamental components, including training objective, attention mechanism and decoding policy. Experiments on the most popular speech-to-text translation benchmark dataset, MuST-C, show that UniST achieves significant improvement for non-streaming ST, and a better-learned trade-off for BLEU score and latency metrics for streaming ST, compared with end-to-end baselines and the cascaded models. We will make our codes and evaluation tools publicly available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/10/2020

Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition

In this paper, we present a novel two-pass approach to unify streaming a...
research
06/01/2023

Enhancing the Unified Streaming and Non-streaming Model with Contrastive Learning

The unified streaming and non-streaming speech recognition model has ach...
research
03/14/2023

Adapting Offline Speech Translation Models for Streaming with Future-Aware Distillation and Inference

A popular approach to streaming speech translation is to employ a single...
research
01/22/2021

Streaming Models for Joint Speech Recognition and Translation

Using end-to-end models for speech translation (ST) has increasingly bee...
research
09/14/2023

DiariST: Streaming Speech Translation with Speaker Diarization

End-to-end speech translation (ST) for conversation recordings involves ...
research
11/09/2022

Cold Start Streaming Learning for Deep Networks

The ability to dynamically adapt neural networks to newly-available data...
research
06/11/2021

Direct Simultaneous Speech-to-Text Translation Assisted by Synchronized Streaming ASR

Simultaneous speech-to-text translation is widely useful in many scenari...

Please sign up or login with your details

Forgot password? Click here to reset