VIFS: An End-to-End Variational Inference for Foley Sound Synthesis

06/08/2023
by   Junhyeok Lee, et al.
0

The goal of DCASE 2023 Challenge Task 7 is to generate various sound clips for Foley sound synthesis (FSS) by "category-to-sound" approach. "Category" is expressed by a single index while corresponding "sound" covers diverse and different sound examples. To generate diverse sounds for a given category, we adopt VITS, a text-to-speech (TTS) model with variational inference. In addition, we apply various techniques from speech synthesis including PhaseAug and Avocodo. Different from TTS models which generate short pronunciation from phonemes and speaker identity, the category-to-sound problem requires generating diverse sounds just from a category index. To compensate for the difference while maintaining consistency within each audio clip, we heavily modified the prior encoder to enhance consistency with posterior latent variables. This introduced additional Gaussian on the prior encoder which promotes variance within the category. With these modifications, we propose VIFS, variational inference for end-to-end Foley sound synthesis, which generates diverse high-quality sounds.

READ FULL TEXT
research
02/24/2023

PITS: Variational Pitch Inference without Fundamental Frequency for End-to-End Pitch-controllable TTS

Previous pitch-controllable text-to-speech (TTS) models rely on directly...
research
04/29/2023

Environmental sound conversion from vocal imitations and sound event labels

One way of expressing an environmental sound is using vocal imitations, ...
research
04/15/2021

Variational Inference for Category Recommendation in E-Commerce platforms

Category recommendation for users on an e-Commerce platform is an import...
research
10/17/2021

VISinger: Variational Inference with Adversarial Learning for End-to-End Singing Voice Synthesis

In this paper, we propose VISinger, a complete end-to-end high-quality s...
research
11/20/2018

Sound-Stream II: Towards Real-Time Gesture Controlled Articulatory Sound Synthesis

We present an interface involving four degrees-of-freedom (DOF) mechanic...
research
07/21/2022

A Proposal for Foley Sound Synthesis Challenge

"Foley" refers to sound effects that are added to multimedia during post...
research
04/25/2023

Foley Sound Synthesis at the DCASE 2023 Challenge

The addition of Foley sound effects during post-production is a common t...

Please sign up or login with your details

Forgot password? Click here to reset