Soundify: Matching Sound Effects to Video

12/17/2021
by   David Chuan-En Lin, et al.
0

In the art of video editing, sound is really half the story. A skilled video editor overlays sounds, such as effects and ambients, over footage to add character to an object or immerse the viewer within a space. However, through formative interviews with professional video editors, we found that this process can be extremely tedious and time-consuming. We introduce Soundify, a system that matches sound effects to video. By leveraging labeled, studio-quality sound effects libraries and extending CLIP, a neural network with impressive zero-shot image classification capabilities, into a "zero-shot detector", we are able to produce high-quality results without resource-intensive correspondence learning or audio generation. We encourage you to have a look at, or better yet, have a listen to the results at https://chuanenlin.com/soundify.

READ FULL TEXT

page 1

page 2

page 7

research
04/17/2023

Conditional Generation of Audio from Video via Foley Analogies

The sound effects that designers add to videos are designed to convey a ...
research
08/20/2019

From Text to Sound: A Preliminary Study on Retrieving Sound Effects to Radio Stories

Sound effects play an essential role in producing high-quality radio sto...
research
07/22/2022

Zero-Shot Video Captioning with Evolving Pseudo-Tokens

We introduce a zero-shot video captioning method that employs two frozen...
research
06/21/2023

A Multimodal Prototypical Approach for Unsupervised Sound Classification

In the context of environmental sound classification, the adaptability o...
research
08/18/2022

Representation Learning for the Automatic Indexing of Sound Effects Libraries

Labeling and maintaining a commercial sound effects library is a time-co...
research
09/02/2020

Degradation effects of water immersion on earbud audio quality

Earbuds are subjected to constant use and scenarios that may degrade sou...
research
05/10/2023

Sketching the Future (STF): Applying Conditional Control Techniques to Text-to-Video Models

The proliferation of video content demands efficient and flexible neural...

Please sign up or login with your details

Forgot password? Click here to reset