Incremental Speech Synthesis For Speech-To-Speech Translation

10/15/2021
by   Danni Liu, et al.
0

In a speech-to-speech translation (S2ST) pipeline, the text-to-speech (TTS) module is an important component for delivering the translated speech to users. To enable incremental S2ST, the TTS module must be capable of synthesizing and playing utterances while its input text is still streaming in. In this work, we focus on improving the incremental synthesis performance of TTS models. With a simple data augmentation strategy based on prefixes, we are able to improve the incremental TTS quality to approach offline performance. Furthermore, we bring our incremental TTS system to the practical scenario in combination with an upstream simultaneous speech translation system, and show the gains also carry over to this use-case. In addition, we propose latency metrics tailored to S2ST applications, and investigate methods for latency reduction in this context.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/10/2020

Simultaneous Speech-to-Speech Translation System with Neural Incremental ASR, MT, and TTS

This paper presents a newly developed, simultaneous neural speech-to-spe...
research
11/25/2022

Efficient Incremental Text-to-Speech on GPUs

Incremental text-to-speech, also known as streaming TTS, has been increa...
research
12/06/2019

Re-Translation Strategies For Long Form, Simultaneous, Spoken Language Translation

We investigate the problem of simultaneous machine translation of long-f...
research
11/07/2019

Incremental Text-to-Speech Synthesis with Prefix-to-Prefix Framework

Text-to-speech synthesis (TTS) has witnessed rapid progress in recent ye...
research
09/20/2023

Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff

Blockwise self-attentional encoder models have recently emerged as one p...
research
09/22/2021

Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network

Incremental text-to-speech (TTS) synthesis generates utterances in small...
research
10/31/2021

Visualization: the missing factor in Simultaneous Speech Translation

Simultaneous speech translation (SimulST) is the task in which output ge...

Please sign up or login with your details

Forgot password? Click here to reset