SoundStorm: Efficient Parallel Audio Generation

05/16/2023
by   Zalán Borsos, et al.
0

We present SoundStorm, a model for efficient, non-autoregressive audio generation. SoundStorm receives as input the semantic tokens of AudioLM, and relies on bidirectional attention and confidence-based parallel decoding to generate the tokens of a neural audio codec. Compared to the autoregressive generation approach of AudioLM, our model produces audio of the same quality and with higher consistency in voice and acoustic conditions, while being two orders of magnitude faster. SoundStorm generates 30 seconds of audio in 0.5 seconds on a TPU-v4. We demonstrate the ability of our model to scale audio generation to longer sequences by synthesizing high-quality, natural dialogue segments, given a transcript annotated with speaker turns and a short prompt with the speakers' voices.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/07/2022

AudioLM: a Language Modeling Approach to Audio Generation

We introduce AudioLM, a framework for high-quality audio generation with...
research
06/15/2021

MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis

Recent developments in deep learning have significantly improved the qua...
research
05/30/2023

Make-A-Voice: Unified Voice Synthesis With Discrete Representation

Various applications of voice synthesis have been developed independentl...
research
09/21/2020

DiffWave: A Versatile Diffusion Model for Audio Synthesis

In this work, we propose DiffWave, a versatile Diffusion probabilistic m...
research
07/08/2022

FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech Synthesis

Unconstrained lip-to-speech synthesis aims to generate corresponding spe...
research
12/22/2016

SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

In this paper we propose a novel model for unconditional audio generatio...
research
11/25/2020

FBWave: Efficient and Scalable Neural Vocoders for Streaming Text-To-Speech on the Edge

Nowadays more and more applications can benefit from edge-based text-to-...

Please sign up or login with your details

Forgot password? Click here to reset