An Initial Exploration: Learning to Generate Realistic Audio for Silent Video

08/23/2023
by   Matthew Martel, et al.
0

Generating realistic audio effects for movies and other media is a challenging task that is accomplished today primarily through physical techniques known as Foley art. Foley artists create sounds with common objects (e.g., boxing gloves, broken glass) in time with video as it is playing to generate captivating audio tracks. In this work, we aim to develop a deep-learning based framework that does much the same - observes video in it's natural sequence and generates realistic audio to accompany it. Notably, we have reason to believe this is achievable due to advancements in realistic audio generation techniques conditioned on other inputs (e.g., Wavenet conditioned on text). We explore several different model architectures to accomplish this task that process both previously-generated audio and video context. These include deep-fusion CNN, dilated Wavenet CNN with visual context, and transformer-based architectures. We find that the transformer-based architecture yields the most promising results, matching low-frequencies to visual patterns effectively, but failing to generate more nuanced waveforms.

READ FULL TEXT

page 2

page 4

page 5

page 6

research
09/19/2023

FoleyGen: Visually-Guided Audio Generation

Recent advancements in audio generation have been spurred by the evoluti...
research
03/29/2023

Sounding Video Generator: A Unified Framework for Text-guided Sounding Video Generation

As a combination of visual and audio signals, video is inherently multi-...
research
07/20/2021

FoleyGAN: Visually Guided Generative Adversarial Network-Based Synchronous Sound Generation in Silent Videos

Deep learning based visual to sound generation systems essentially need ...
research
07/23/2020

Sound2Sight: Generating Visual Dynamics from Sound and Context

Learning associations across modalities is critical for robust multimoda...
research
04/01/2021

Collaborative Learning to Generate Audio-Video Jointly

There have been a number of techniques that have demonstrated the genera...
research
04/28/2018

A Bimodal Learning Approach to Assist Multi-sensory Effects Synchronization

In mulsemedia applications, traditional media content (text, image, audi...
research
03/15/2023

Blind Estimation of Audio Processing Graph

Musicians and audio engineers sculpt and transform their sounds by conne...

Please sign up or login with your details

Forgot password? Click here to reset