Mask-based Neural Beamforming for Moving Speakers with Self-Attention-based Tracking

05/07/2022
by   Tsubasa Ochiai, et al.
0

Beamforming is a powerful tool designed to enhance speech signals from the direction of a target source. Computing the beamforming filter requires estimating spatial covariance matrices (SCMs) of the source and noise signals. Time-frequency masks are often used to compute these SCMs. Most studies of mask-based beamforming have assumed that the sources do not move. However, sources often move in practice, which causes performance degradation. In this paper, we address the problem of mask-based beamforming for moving sources. We first review classical approaches to tracking a moving source, which perform online or blockwise computation of the SCMs. We show that these approaches can be interpreted as computing a sum of instantaneous SCMs weighted by attention weights. These weights indicate which time frames of the signal to consider in the SCM computation. Online or blockwise computation assumes a heuristic and deterministic way of computing these attention weights that, although simple, may not result in optimal performance. We thus introduce a learning-based framework that computes optimal attention weights for beamforming. We achieve this using a neural network implemented with self-attention layers. We show experimentally that our proposed framework can greatly improve beamforming performance in moving source situations while maintaining high performance in non-moving situations, thus enabling the development of mask-based beamformers robust to source movements.

READ FULL TEXT

page 1

page 9

page 10

research
07/11/2019

Multichannel Loss Function for Supervised Speech Source Separation by Mask-based Beamforming

In this paper, we propose two mask-based beamforming methods using a dee...
research
07/22/2022

DNN-Free Low-Latency Adaptive Speech Enhancement Based on Frame-Online Beamforming Powered by Block-Online FastMNMF

This paper describes a practical dual-process speech enhancement system ...
research
06/17/2019

Weighted delay-and-sum beamforming guided by visual tracking for human-robot interaction

This paper describes the integration of weighted delay-and-sum beamformi...
research
11/18/2019

Alternating Between Spectral and Spatial Estimation for Speech Separation and Enhancement

This work investigates alternation between spectral separation using mas...
research
10/06/2021

Lightweight Speech Enhancement in Unseen Noisy and Reverberant Conditions using KISS-GEV Beamforming

This paper introduces a new method referred to as KISS-GEV (for Keep It ...
research
05/23/2020

Exploring Optimal DNN Architecture for End-to-End Beamformers Based on Time-frequency References

Acoustic beamformers have been widely used to enhance audio signals. Cur...
research
10/19/2021

Patch Based Transformation for Minimum Variance Beamformer Image Approximation Using Delay and Sum Pipeline

In the recent past, there have been several efforts in accelerating comp...

Please sign up or login with your details

Forgot password? Click here to reset