Better speech synthesis through scaling

05/12/2023
by   James Betker, et al.
0

In recent years, the field of image generation has been revolutionized by the application of autoregressive transformers and DDPMs. These approaches model the process of image generation as a step-wise probabilistic processes and leverage large amounts of compute and data to learn the image distribution. This methodology of improving performance need not be confined to images. This paper describes a way to apply advances in the image generative domain to speech synthesis. The result is TorToise – an expressive, multi-voice text-to-speech system. All model code and trained weights have been open-sourced at https://github.com/neonbjb/tortoise-tts.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/26/2019

MixNMatch: Multifactor Disentanglement and Encodingfor Conditional Image Generation

We present MixNMatch, a conditional generative model that learns to dise...
research
09/07/2023

Autoregressive Omni-Aware Outpainting for Open-Vocabulary 360-Degree Image Generation

A 360-degree (omni-directional) image provides an all-encompassing spher...
research
11/26/2019

MixNMatch: Multifactor Disentanglement and Encoding for Conditional Image Generation

We present MixNMatch, a conditional generative model that learns to dise...
research
09/02/2023

RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large Model

Text-to-image generation (TTI) refers to the usage of models that could ...
research
05/24/2023

Vision + Language Applications: A Survey

Text-to-image generation has attracted significant interest from researc...
research
04/27/2023

IconShop: Text-Guided Vector Icon Synthesis with Autoregressive Transformers

Scalable Vector Graphics (SVG) is a popular vector image format that off...
research
10/06/2020

Neural Speech Synthesis for Estonian

This technical report describes the results of a collaboration between t...

Please sign up or login with your details

Forgot password? Click here to reset