Manifold learning-supported estimation of relative transfer functions for spatial filtering

10/05/2021
by   Andreas Brendel, et al.
0

Many spatial filtering algorithms used for voice capture in, e.g., teleconferencing applications, can benefit from or even rely on knowledge of Relative Transfer Functions (RTFs). Accordingly, many RTF estimators have been proposed which, however, suffer from performance degradation under acoustically adverse conditions or need prior knowledge on the properties of the interfering sources. While state-of-the-art RTF estimators ignore prior knowledge about the acoustic enclosure, audio signal processing algorithms for teleconferencing equipment are often operating in the same or at least a similar acoustic enclosure, e.g., a car or an office, such that training data can be collected. In this contribution, we use such data to train Variational Autoencoders (VAEs) in an unsupervised manner and apply the trained VAEs to enhance imprecise RTF estimates. Furthermore, a hybrid between classic RTF estimation and the trained VAE is investigated. Comprehensive experiments with real-world data confirm the efficacy for the proposed method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/04/2023

AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis

Human perception of the complex world relies on a comprehensive analysis...
research
05/25/2018

Relative Transfer Function Estimation Exploiting Spatially Separated Microphones in a Diffuse Noise Field

Many multi-microphone speech enhancement algorithms require the relative...
research
05/25/2018

Relative Transfer Function Estimation Exploiting Spatially Separated Microphones in an Incoherent Noise Field

Many multi-microphone speech enhancement algorithms require the relative...
research
12/15/2016

Reflectance Adaptive Filtering Improves Intrinsic Image Estimation

Separating an image into reflectance and shading layers poses a challeng...
research
04/22/2019

hf0: A hybrid pitch extraction method for multimodal voice

Pitch or fundamental frequency (f0) extraction is a fundamental problem ...
research
03/14/2022

MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One More Step Towards Generalization

In this work we present a new State-of-The-Art on the text-to-video retr...
research
02/24/2015

A Review of Audio Features and Statistical Models Exploited for Voice Pattern Design

Audio fingerprinting, also named as audio hashing, has been well-known a...

Please sign up or login with your details

Forgot password? Click here to reset