FALL-E: A Foley Sound Synthesis Model and Strategies

06/16/2023
by   Minsung Kang, et al.
0

This paper introduces FALL-E, a foley synthesis system and its training/inference strategies. The FALL-E model employs a cascaded approach comprising low-resolution spectrogram generation, spectrogram super-resolution, and a vocoder. We trained every sound-related model from scratch using our extensive datasets, and utilized a pre-trained language model. We conditioned the model with dataset-specific texts, enabling it to learn sound quality and recording environment based on text input. Moreover, we leveraged external language models to improve text descriptions of our datasets and performed prompt engineering for quality, coherence, and diversity. FALL-E was evaluated by an objective measure as well as listening tests in the DCASE 2023 challenge Task 7. The submission achieved the second place on average, while achieving the best score for diversity, second place for audio quality, and third place for class fitness.

READ FULL TEXT

page 3

page 4

research
09/07/2020

Improving Language Generation with Sentence Coherence Objective

Conditional story generation and contextual text continuation have becom...
research
11/18/2022

Magic3D: High-Resolution Text-to-3D Content Creation

DreamFusion has recently demonstrated the utility of a pre-trained text-...
research
04/29/2022

C3-STISR: Scene Text Image Super-resolution with Triple Clues

Scene text image super-resolution (STISR) has been regarded as an import...
research
04/22/2019

The Curious Case of Neural Text Degeneration

Despite considerable advancements with deep neural language models, the ...
research
03/30/2023

Prefix tuning for automated audio captioning

Audio captioning aims to generate text descriptions from environmental s...
research
09/15/2021

Improving Text Auto-Completion with Next Phrase Prediction

Language models such as GPT-2 have performed well on constructing syntac...

Please sign up or login with your details

Forgot password? Click here to reset