A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI

03/23/2023
by   Chenshuang Zhang, et al.
0

Generative AI has demonstrated impressive performance in various fields, among which speech synthesis is an interesting direction. With the diffusion model as the most popular generative model, numerous works have attempted two active tasks: text to speech and speech enhancement. This work conducts a survey on audio diffusion model, which is complementary to existing surveys that either lack the recent progress of diffusion-based speech synthesis or highlight an overall picture of applying diffusion model in multiple fields. Specifically, this work first briefly introduces the background of audio and diffusion model. As for the text-to-speech task, we divide the methods into three categories based on the stage where diffusion model is adopted: acoustic model, vocoder and end-to-end framework. Moreover, we categorize various speech enhancement tasks by either certain signals are removed or added into the input speech. Comparisons of experimental results and discussions are also covered in this survey.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/10/2022

Conditional Diffusion Probabilistic Model for Speech Enhancement

Speech enhancement is a critical component of many user-oriented audio a...
research
10/30/2022

SRTNet: Time Domain Speech Enhancement Via Stochastic Refinement

Diffusion model, as a new generative model which is very popular in imag...
research
11/08/2022

DiffPhase: Generative Diffusion-based STFT Phase Retrieval

Diffusion probabilistic models have been recently used in a variety of t...
research
06/07/2022

Universal Speech Enhancement with Score-based Diffusion

Removing background noise from speech audio has been the subject of cons...
research
04/04/2023

A Survey on Graph Diffusion Models: Generative AI in Science for Molecule, Protein and Material

Diffusion models have become a new SOTA generative modeling method in va...
research
05/11/2022

Real-Time Packet Loss Concealment With Mixed Generative and Predictive Model

As deep speech enhancement algorithms have recently demonstrated capabil...
research
06/14/2023

Variance-Preserving-Based Interpolation Diffusion Models for Speech Enhancement

The goal of this study is to implement diffusion models for speech enhan...

Please sign up or login with your details

Forgot password? Click here to reset