Efficient Incremental Text-to-Speech on GPUs

11/25/2022
by   Muyang Du, et al.
0

Incremental text-to-speech, also known as streaming TTS, has been increasingly applied to online speech applications that require ultra-low response latency to provide an optimal user experience. However, most of the existing speech synthesis pipelines deployed on GPU are still non-incremental, which uncovers limitations in high-concurrency scenarios, especially when the pipeline is built with end-to-end neural network models. To address this issue, we present a highly efficient approach to perform real-time incremental TTS on GPUs with Instant Request Pooling and Module-wise Dynamic Batching. Experimental results demonstrate that the proposed method is capable of producing high-quality speech with a first-chunk latency lower than 80ms under 100 QPS on a single NVIDIA A10 GPU and significantly outperforms the non-incremental twin in both concurrency and latency. Our work reveals the effectiveness of high-performance incremental TTS on GPUs.

READ FULL TEXT
research
10/15/2021

Incremental Speech Synthesis For Speech-To-Speech Translation

In a speech-to-speech translation (S2ST) pipeline, the text-to-speech (T...
research
12/23/2020

Incremental Text-to-Speech Synthesis Using Pseudo Lookahead with Large Pretrained Language Model

Text-to-speech (TTS) synthesis, a technique for artificially generating ...
research
09/22/2021

Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network

Incremental text-to-speech (TTS) synthesis generates utterances in small...
research
11/01/2021

Principles towards Real-Time Simulation of Material Point Method on Modern GPUs

Physics-based simulation has been actively employed in generating offlin...
research
02/13/2013

Some Experiments with Real-Time Decision Algorithms

Real-time Decision algorithms are a class of incremental resource-bounde...
research
11/07/2019

Incremental Text-to-Speech Synthesis with Prefix-to-Prefix Framework

Text-to-speech synthesis (TTS) has witnessed rapid progress in recent ye...
research
05/02/2022

Teaching BERT to Wait: Balancing Accuracy and Latency for Streaming Disfluency Detection

In modern interactive speech-based systems, speech is consumed and trans...

Please sign up or login with your details

Forgot password? Click here to reset