FaSNet: Low-latency Adaptive Beamforming for Multi-microphone Audio Processing

09/29/2019
by   Yi Luo, et al.
0

Beamforming has been extensively investigated for multi-channel audio processing tasks. Recently, learning-based beamforming methods, sometimes called neural beamformers, have achieved significant improvements in both signal quality (e.g. signal-to-noise ratio (SNR)) and speech recognition (e.g. word error rate (WER)). Such systems are generally non-causal and require a large context for robust estimation of inter-channel features, which is impractical in applications requiring low-latency responses. In this paper, we propose filter-and-sum network (FaSNet), a time-domain, filter-based beamforming approach suitable for low-latency scenarios. FaSNet has a two-stage system design that first learns frame-level time-domain adaptive beamforming filters for a selected reference channel, and then calculate the filters for all remaining channels. The filtered outputs at all channels are summed to generate the final output. Experiments show that despite its small model size, FaSNet is able to outperform several traditional oracle beamformers with respect to scale-invariant signal-to-noise ratio (SI-SNR) in reverberant speech enhancement and separation tasks. Moreover, when trained with a frequency-domain objective function on the CHiME-3 dataset, FaSNet achieves 14.3% relative word error rate reduction (RWERR) compared with the baseline model. These results show the efficacy of FaSNet particularly in reverberant and noisy signal conditions.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

page 6

page 7

page 8

research
05/19/2020

A Lite Microphone Array Beamforming Scheme with Maximum Signal-to-Noise Ratio Filter

Since space-domain information can be utilized, microphone array beamfor...
research
05/17/2022

Streaming Noise Context Aware Enhancement For Automatic Speech Recognition in Multi-Talker Environments

One of the most challenging scenarios for smart speakers is multi-talker...
research
11/14/2020

On Filter Generalization for Music Bandwidth Extension Using Deep Neural Networks

In this paper, we address a sub-topic of the broad domain of audio enhan...
research
10/25/2019

A Multi-Phase Gammatone Filterbank for Speech Separation via TasNet

In this work, we investigate if the learned encoder of the end-to-end co...
research
08/17/2020

Efficient Low-Latency Speech Enhancement with Mobile Audio Streaming Networks

We propose Mobile Audio Streaming Networks (MASnet) for efficient low-la...
research
02/26/2023

DFSNet: A Steerable Neural Beamformer Invariant to Microphone Array Configuration for Real-Time, Low-Latency Speech Enhancement

Invariance to microphone array configuration is a rare attribute in neur...
research
02/16/2022

Low Latency Real-Time Seizure Detection Using Transfer Deep Learning

Scalp electroencephalogram (EEG) signals inherently have a low signal-to...

Please sign up or login with your details

Forgot password? Click here to reset