Neural Fourier Shift for Binaural Speech Rendering

11/02/2022
by   Jin-woo Lee, et al.
0

We present a neural network for rendering binaural speech from given monaural audio, position, and orientation of the source. Most of the previous works have focused on synthesizing binaural speeches by conditioning the positions and orientations in the feature space of convolutional neural networks. These synthesis approaches are powerful in estimating the target binaural speeches even for in-the-wild data but are difficult to generalize for rendering the audio from out-of-distribution domains. To alleviate this, we propose Neural Fourier Shift (NFS), a novel network architecture that enables binaural speech rendering in the Fourier space. Specifically, utilizing a geometric time delay based on the distance between the source and the receiver, NFS is trained to predict the delays and scales of various early reflections. NFS is efficient in both memory and computational cost, is interpretable, and operates independently of the source domain by its design. With up to 25 times lighter memory and 6 times fewer calculations, the experimental results show that NFS outperforms the previous studies on the benchmark dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/06/2022

Global HRTF Interpolation via Learned Affine Transformation of Hyper-conditioned Features

Estimating Head-Related Transfer Functions (HRTFs) of arbitrary source p...
research
07/24/2023

Cross Contrastive Feature Perturbation for Domain Generalization

Domain generalization (DG) aims to learn a robust model from source doma...
research
01/27/2023

A Comparison of Tiny-nerf versus Spatial Representations for 3d Reconstruction

Neural rendering has emerged as a powerful paradigm for synthesizing ima...
research
03/10/2023

An End-to-End Neural Network for Image-to-Audio Transformation

This paper describes an end-to-end (E2E) neural architecture for the aud...
research
10/12/2018

A Fully Time-domain Neural Model for Subband-based Speech Synthesizer

This paper introduces a deep neural network model for subband-based spee...
research
09/03/2021

U-FNO – an enhanced Fourier neural operator based-deep learning model for multiphase flow

Numerical simulation of multiphase flow in porous media is essential for...
research
03/13/2023

FreeNeRF: Improving Few-shot Neural Rendering with Free Frequency Regularization

Novel view synthesis with sparse inputs is a challenging problem for neu...

Please sign up or login with your details

Forgot password? Click here to reset