2.5D Visual Sound

12/11/2018
by   Ruohan Gao, et al.
14

Binaural audio provides a listener with 3D sound sensation, allowing a rich perceptual experience of the scene. However, binaural recordings are scarcely available and require nontrivial expertise and equipment to obtain. We propose to convert common monaural audio into binaural audio by leveraging video. The key idea is that visual frames reveal significant spatial cues that, while explicitly lacking in the accompanying single-channel audio, are strongly linked to it. Our multi-modal approach recovers this link from unlabeled video. We devise a deep convolutional neural network that learns to decode the monaural (single-channel) soundtrack into its binaural counterpart by injecting visual information about object and scene configurations. We call the resulting output 2.5D visual sound---the visual stream helps "lift" the flat single channel audio into spatialized sound. In addition to sound generation, we show the self-supervised representation learned by our network benefits audio-visual source separation. Our video results: http://vision.cs.utexas.edu/projects/2.5D_visual_sound/

READ FULL TEXT

page 1

page 3

page 4

page 8

research
04/10/2018

Audio-Visual Scene Analysis with Self-Supervised Multisensory Features

The thud of a bouncing ball, the onset of speech as lips open -- when vi...
research
11/21/2021

Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video

Binaural audio provides human listeners with an immersive spatial sound ...
research
01/04/2023

Chat2Map: Efficient Scene Mapping from Multi-Ego Conversations

Can conversational videos captured from multiple egocentric viewpoints r...
research
04/16/2019

Co-Separating Sounds of Visual Objects

Learning how objects sound from video is challenging, since they often h...
research
04/13/2021

Visually Informed Binaural Audio Generation without Binaural Audios

Stereophonic audio, especially binaural audio, plays an essential role i...
research
09/07/2018

Self-Supervised Generation of Spatial Audio for 360 Video

We introduce an approach to convert mono audio recorded by a 360 video c...
research
02/21/2020

AutoFoley: Artificial Synthesis of Synchronized Sound Tracks for Silent Videos with Deep Learning

In movie productions, the Foley Artist is responsible for creating an ov...

Please sign up or login with your details

Forgot password? Click here to reset