Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video

11/21/2021
by   Rishabh Garg, et al.
0

Binaural audio provides human listeners with an immersive spatial sound experience, but most existing videos lack binaural audio recordings. We propose an audio spatialization method that draws on visual information in videos to convert their monaural (single-channel) audio to binaural audio. Whereas existing approaches leverage visual features extracted directly from video frames, our approach explicitly disentangles the geometric cues present in the visual stream to guide the learning process. In particular, we develop a multi-task framework that learns geometry-aware features for binaural audio generation by accounting for the underlying room impulse response, the visual stream's coherence with the sound source(s) positions, and the consistency in geometry of the sounding objects over time. Furthermore, we introduce a new large video dataset with realistic binaural audio simulated for real-world scanned environments. On two datasets, we demonstrate the efficacy of our method, which achieves state-of-the-art results.

READ FULL TEXT
research
12/11/2018

2.5D Visual Sound

Binaural audio provides a listener with 3D sound sensation, allowing a r...
research
09/02/2021

Binaural Audio Generation via Multi-task Learning

We present a learning-based approach for generating binaural audio from ...
research
04/13/2021

Visually Informed Binaural Audio Generation without Binaural Audios

Stereophonic audio, especially binaural audio, plays an essential role i...
research
05/12/2018

Scene-Aware Audio for 360 Videos

Although 360 cameras ease the capture of panoramic footage, it remains c...
research
04/20/2021

Detection of Audio-Video Synchronization Errors Via Event Detection

We present a new method and a large-scale database to detect audio-video...
research
02/12/2022

Audio-Visual Fusion Layers for Event Type Aware Video Recognition

Human brain is continuously inundated with the multisensory information ...
research
03/02/2019

Making Sense of Audio Vibration for Liquid Height Estimation in Robotic Pouring

In this paper, we focus on the challenging perception problem in robotic...

Please sign up or login with your details

Forgot password? Click here to reset