AutoFoley: Artificial Synthesis of Synchronized Sound Tracks for Silent Videos with Deep Learning

02/21/2020
by   Sanchita Ghose, et al.
19

In movie productions, the Foley Artist is responsible for creating an overlay soundtrack that helps the movie come alive for the audience. This requires the artist to first identify the sounds that will enhance the experience for the listener thereby reinforcing the Directors's intention for a given scene. In this paper, we present AutoFoley, a fully-automated deep learning tool that can be used to synthesize a representative audio track for videos. AutoFoley can be used in the applications where there is either no corresponding audio file associated with the video or in cases where there is a need to identify critical scenarios and provide a synthesized, reinforced soundtrack. An important performance criterion of the synthesized soundtrack is to be time-synchronized with the input video, which provides for a realistic and believable portrayal of the synthesized sound. Unlike existing sound prediction and generation architectures, our algorithm is capable of precise recognition of actions as well as inter-frame relations in fast moving video clips by incorporating an interpolation technique and Temporal Relationship Networks (TRN). We employ a robust multi-scale Recurrent Neural Network (RNN) associated with a Convolutional Neural Network (CNN) for a better understanding of the intricate input-to-output associations over time. To evaluate AutoFoley, we create and introduce a large scale audio-video dataset containing a variety of sounds frequently used as Foley effects in movies. Our experiments show that the synthesized sounds are realistically portrayed with accurate temporal synchronization of the associated visual inputs. Human qualitative testing of AutoFoley show over 73 soundtrack as original, which is a noteworthy improvement in cross-modal research in sound synthesis.

READ FULL TEXT

page 1

page 2

page 4

page 6

page 7

page 8

page 9

page 10

research
07/20/2021

FoleyGAN: Visually Guided Generative Adversarial Network-Based Synchronous Sound Generation in Silent Videos

Deep learning based visual to sound generation systems essentially need ...
research
07/23/2021

Using Deep Learning Techniques and Inferential Speech Statistics for AI Synthesised Speech Recognition

The recent developments in technology have re-warded us with amazing aud...
research
04/17/2023

Conditional Generation of Audio from Video via Foley Analogies

The sound effects that designers add to videos are designed to convey a ...
research
05/09/2019

Sound texture synthesis using convolutional neural networks

The following article introduces a new parametric synthesis algorithm fo...
research
12/11/2018

2.5D Visual Sound

Binaural audio provides a listener with 3D sound sensation, allowing a r...
research
01/20/2023

Novel-View Acoustic Synthesis

We introduce the novel-view acoustic synthesis (NVAS) task: given the si...
research
12/25/2019

Improving Visual Recognition using Ambient Sound for Supervision

Our brains combine vision and hearing to create a more elaborate interpr...

Please sign up or login with your details

Forgot password? Click here to reset