Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation

09/19/2023
by   Yatong Bai, et al.
0

Diffusion models power a vast majority of text-to-audio (TTA) generation methods. Unfortunately, these models suffer from slow inference speed due to iterative queries to the underlying denoising network, thus unsuitable for scenarios with inference time or computational constraints. This work modifies the recently proposed consistency distillation framework to train TTA models that require only a single neural network query. In addition to incorporating classifier-free guidance into the distillation process, we leverage the availability of generated audio during distillation training to fine-tune the consistency TTA model with novel loss functions in the audio space, such as the CLAP score. Our objective and subjective evaluation results on the AudioCaps dataset show that consistency models retain diffusion models' high generation quality and diversity while reducing the number of queries by a factor of 400.

READ FULL TEXT
research
06/08/2023

BOOT: Data-free Distillation of Denoising Diffusion Models with Bootstrapping

Diffusion models have demonstrated excellent potential for generating di...
research
03/14/2023

Let 2D Diffusion Model Know 3D-Consistency for Robust Text-to-3D Generation

Text-to-3D generation has shown rapid progress in recent days with the a...
research
06/01/2023

SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds

Text-to-image diffusion models can create stunning images from natural l...
research
09/11/2023

CaloClouds II: Ultra-Fast Geometry-Independent Highly-Granular Calorimeter Simulation

Fast simulation of the energy depositions in high-granular detectors is ...
research
07/13/2023

PC-Droid: Faster diffusion and improved quality for particle cloud generation

Building on the success of PC-JeDi we introduce PC-Droid, a substantiall...
research
04/09/2023

A Comprehensive Survey on Knowledge Distillation of Diffusion Models

Diffusion Models (DMs), also referred to as score-based diffusion models...
research
05/25/2023

ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation

Score distillation sampling (SDS) has shown great promise in text-to-3D ...

Please sign up or login with your details

Forgot password? Click here to reset