Binaural Audio Generation via Multi-task Learning

09/02/2021
by   Sijia Li, et al.
0

We present a learning-based approach for generating binaural audio from mono audio using multi-task learning. Our formulation leverages additional information from two related tasks: the binaural audio generation task and the flipped audio classification task. Our learning model extracts spatialization features from the visual and audio input, predicts the left and right audio channels, and judges whether the left and right channels are flipped. First, we extract visual features using ResNet from the video frames. Next, we perform binaural audio generation and flipped audio classification using separate subnetworks based on visual features. Our learning method optimizes the overall loss based on the weighted sum of the losses of the two tasks. We train and evaluate our model on the FAIR-Play dataset and the YouTube-ASMR dataset. We perform quantitative and qualitative evaluations to demonstrate the benefits of our approach over prior techniques.

READ FULL TEXT

page 1

page 4

page 8

page 9

page 10

page 12

research
11/21/2021

Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video

Binaural audio provides human listeners with an immersive spatial sound ...
research
06/11/2020

Telling Left from Right: Learning Spatial Correspondence between Sight and Sound

Self-supervised audio-visual learning aims to capture useful representat...
research
06/11/2020

Telling Left from Right: Learning Spatial Correspondence of Sight and Sound

Self-supervised audio-visual learning aims to capture useful representat...
research
05/28/2021

Audio-visual scene classification: analysis of DCASE 2021 Challenge submissions

This paper presents the details of the Audio-Visual Scene Classification...
research
01/10/2017

Multi-task Learning Of Deep Neural Networks For Audio Visual Automatic Speech Recognition

Multi-task learning (MTL) involves the simultaneous training of two or m...
research
05/13/2023

MetaMorphosis: Task-oriented Privacy Cognizant Feature Generation for Multi-task Learning

With the growth of computer vision applications, deep learning, and edge...
research
10/15/2022

LAD: A Hybrid Deep Learning System for Benign Paroxysmal Positional Vertigo Disorders Diagnostic

Herein, we introduce "Look and Diagnose" (LAD), a hybrid deep learning-b...

Please sign up or login with your details

Forgot password? Click here to reset