FoleyGAN: Visually Guided Generative Adversarial Network-Based Synchronous Sound Generation in Silent Videos

07/20/2021
by   Sanchita Ghose, et al.
9

Deep learning based visual to sound generation systems essentially need to be developed particularly considering the synchronicity aspects of visual and audio features with time. In this research we introduce a novel task of guiding a class conditioned generative adversarial network with the temporal visual information of a video input for visual to sound generation task adapting the synchronicity traits between audio-visual modalities. Our proposed FoleyGAN model is capable of conditioning action sequences of visual events leading towards generating visually aligned realistic sound tracks. We expand our previously proposed Automatic Foley dataset to train with FoleyGAN and evaluate our synthesized sound through human survey that shows noteworthy (on average 81%) audio-visual synchronicity performance. Our approach also outperforms in statistical experiments compared with other baseline models and audio-visual datasets.

READ FULL TEXT

page 1

page 2

page 5

page 9

research
06/15/2023

STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events

While direction of arrival (DOA) of sound events is generally estimated ...
research
02/21/2020

AutoFoley: Artificial Synthesis of Synchronized Sound Tracks for Silent Videos with Deep Learning

In movie productions, the Foley Artist is responsible for creating an ov...
research
07/14/2020

Generating Visually Aligned Sound from Videos

We focus on the task of generating sound from natural videos, and the so...
research
07/14/2019

Autoencoding sensory substitution

Tens of millions of people live blind, and their number is ever increasi...
research
08/16/2019

Towards Generating Ambisonics Using Audio-Visual Cue for Virtual Reality

Ambisonics i.e., a full-sphere surround sound, is quintessential with 36...
research
08/23/2023

An Initial Exploration: Learning to Generate Realistic Audio for Silent Video

Generating realistic audio effects for movies and other media is a chall...
research
12/04/2017

Visual to Sound: Generating Natural Sound for Videos in the Wild

As two of the five traditional human senses (sight, hearing, taste, smel...

Please sign up or login with your details

Forgot password? Click here to reset