Visually Informed Binaural Audio Generation without Binaural Audios

04/13/2021
by   Xudong Xu, et al.
0

Stereophonic audio, especially binaural audio, plays an essential role in immersive viewing environments. Recent research has explored generating visually guided stereophonic audios supervised by multi-channel audio collections. However, due to the requirement of professional recording devices, existing datasets are limited in scale and variety, which impedes the generalization of supervised methods in real-world scenarios. In this work, we propose PseudoBinaural, an effective pipeline that is free of binaural recordings. The key insight is to carefully build pseudo visual-stereo pairs with mono data for training. Specifically, we leverage spherical harmonic decomposition and head-related impulse response (HRIR) to identify the relationship between spatial locations and received binaural audios. Then in the visual modality, corresponding visual cues of the mono data are manually placed at sound source positions to form the pairs. Compared to fully-supervised paradigms, our binaural-recording-free pipeline shows great stability in cross-dataset evaluation and achieves comparable performance under subjective preference. Moreover, combined with binaural recordings, our method is able to further boost the performance of binaural audio generation under supervised settings.

READ FULL TEXT

page 2

page 5

page 8

research
07/20/2020

Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation

Stereophonic audio is an indispensable ingredient to enhance human audit...
research
11/21/2021

Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video

Binaural audio provides human listeners with an immersive spatial sound ...
research
08/16/2019

Towards Generating Ambisonics Using Audio-Visual Cue for Virtual Reality

Ambisonics i.e., a full-sphere surround sound, is quintessential with 36...
research
12/11/2018

2.5D Visual Sound

Binaural audio provides a listener with 3D sound sensation, allowing a r...
research
09/13/2023

Leveraging Foundation models for Unsupervised Audio-Visual Segmentation

Audio-Visual Segmentation (AVS) aims to precisely outline audible object...
research
07/13/2016

AudioPairBank: Towards A Large-Scale Tag-Pair-Based Audio Content Analysis

Recently, sound recognition has been used to identify sounds, such as ca...
research
11/15/2021

Beyond Mono to Binaural: Generating Binaural Audio from Mono Audio with Depth and Cross Modal Attention

Binaural audio gives the listener an immersive experience and can enhanc...

Please sign up or login with your details

Forgot password? Click here to reset