Conditional Generation of Audio from Video via Foley Analogies

04/17/2023
by   Yuexi Du, et al.
0

The sound effects that designers add to videos are designed to convey a particular artistic effect and, thus, may be quite different from a scene's true sound. Inspired by the challenges of creating a soundtrack for a video that differs from its true sound, but that nonetheless matches the actions occurring on screen, we propose the problem of conditional Foley. We present the following contributions to address this problem. First, we propose a pretext task for training our model to predict sound for an input video clip using a conditional audio-visual clip sampled from another time within the same source video. Second, we propose a model for generating a soundtrack for a silent input video, given a user-supplied example that specifies what the video should "sound like". We show through human studies and automated evaluation metrics that our model successfully generates sound from video, while varying its output according to the content of a supplied example. Project site: https://xypb.github.io/CondFoleyGen/

READ FULL TEXT

page 1

page 3

page 4

page 6

page 8

page 13

page 16

page 17

research
07/14/2020

Generating Visually Aligned Sound from Videos

We focus on the task of generating sound from natural videos, and the so...
research
12/17/2021

Soundify: Matching Sound Effects to Video

In the art of video editing, sound is really half the story. A skilled v...
research
01/04/2023

Self-Supervised Video Forensics by Audio-Visual Anomaly Detection

Manipulated videos often contain subtle inconsistencies between their vi...
research
02/21/2020

AutoFoley: Artificial Synthesis of Synchronized Sound Tracks for Silent Videos with Deep Learning

In movie productions, the Foley Artist is responsible for creating an ov...
research
08/16/2019

Towards Generating Ambisonics Using Audio-Visual Cue for Virtual Reality

Ambisonics i.e., a full-sphere surround sound, is quintessential with 36...
research
03/28/2023

Egocentric Auditory Attention Localization in Conversations

In a noisy conversation environment such as a dinner party, people often...
research
01/29/2018

Local Visual Microphones: Improved Sound Extraction from Silent Video

Sound waves cause small vibrations in nearby objects. A few techniques e...

Please sign up or login with your details

Forgot password? Click here to reset