Visual to Sound: Generating Natural Sound for Videos in the Wild

12/04/2017
by   Yipin Zhou, et al.
0

As two of the five traditional human senses (sight, hearing, taste, smell, and touch), vision and sound are basic sources through which humans understand the world. Often correlated during natural events, these two modalities combine to jointly affect human perception. In this paper, we pose the task of generating sound given visual input. Such capabilities could help enable applications in virtual reality (generating sound for virtual scenes automatically) or provide additional accessibility to images or videos for people with visual impairments. As a first step in this direction, we apply learning-based methods to generate raw waveform samples given input video frames. We evaluate our models on a dataset of videos containing a variety of sounds (such as ambient sounds and sounds from people/animals). Our experiments show that the generated sounds are fairly realistic and have good temporal synchronization with the visual inputs.

READ FULL TEXT

page 3

page 7

research
08/16/2019

Towards Generating Ambisonics Using Audio-Visual Cue for Virtual Reality

Ambisonics i.e., a full-sphere surround sound, is quintessential with 36...
research
05/21/2023

El Sonido como Elemento Clave en Prácticas de Realidad Virtual

This article discusses the importance of sound for virtual reality syste...
research
12/28/2015

Visually Indicated Sounds

Objects make distinctive sounds when they are hit or scratched. These so...
research
07/20/2021

FoleyGAN: Visually Guided Generative Adversarial Network-Based Synchronous Sound Generation in Silent Videos

Deep learning based visual to sound generation systems essentially need ...
research
12/25/2019

Improving Visual Recognition using Ambient Sound for Supervision

Our brains combine vision and hearing to create a more elaborate interpr...
research
12/20/2017

Learning Sight from Sound: Ambient Sound Provides Supervision for Visual Learning

The sound of crashing waves, the roar of fast-moving cars -- sound conve...
research
12/15/2019

BatVision: Learning to See 3D Spatial Layout with Two Ears

Virtual camera images showing the correct layout of a space ahead can be...

Please sign up or login with your details

Forgot password? Click here to reset