Speak While You Think: Streaming Speech Synthesis During Text Generation

09/20/2023
by   Avihu Dekel, et al.
0

Large Language Models (LLMs) demonstrate impressive capabilities, yet interaction with these models is mostly facilitated through text. Using Text-To-Speech to synthesize LLM outputs typically results in notable latency, which is impractical for fluent voice conversations. We propose LLM2Speech, an architecture to synthesize speech while text is being generated by an LLM which yields significant latency reduction. LLM2Speech mimics the predictions of a non-streaming teacher model while limiting the exposure to future context in order to enable streaming. It exploits the hidden embeddings of the LLM, a by-product of the text generation that contains informative semantic context. Experimental results show that LLM2Speech maintains the teacher's quality while reducing the latency to enable natural conversations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/25/2023

Watermarking Conditional Text Generation for AI Detection: Unveiling Challenges and a Semantic-Aware Watermark Remedy

To mitigate potential risks associated with language models, recent AI d...
research
05/02/2018

How Flajolet Processed Streams with Coin Flips

This article is a historical introduction to data streaming algorithms t...
research
04/25/2021

Bridging the gap between streaming and non-streaming ASR systems bydistilling ensembles of CTC and RNN-T models

Streaming end-to-end automatic speech recognition (ASR) systems are wide...
research
06/26/2022

On Comparison of Encoders for Attention based End to End Speech Recognition in Standalone and Rescoring Mode

The streaming automatic speech recognition (ASR) models are more popular...
research
11/03/2022

Iterative autoregression: a novel trick to improve your low-latency speech enhancement model

Streaming models are an essential component of real-time speech enhancem...
research
11/07/2019

Incremental Text-to-Speech Synthesis with Prefix-to-Prefix Framework

Text-to-speech synthesis (TTS) has witnessed rapid progress in recent ye...

Please sign up or login with your details

Forgot password? Click here to reset