Towards Unified All-Neural Beamforming for Time and Frequency Domain Speech Separation

12/16/2022
by   Rongzhi Gu, et al.
0

Recently, frequency domain all-neural beamforming methods have achieved remarkable progress for multichannel speech separation. In parallel, the integration of time domain network structure and beamforming also gains significant attention. This study proposes a novel all-neural beamforming method in time domain and makes an attempt to unify the all-neural beamforming pipelines for time domain and frequency domain multichannel speech separation. The proposed model consists of two modules: separation and beamforming. Both modules perform temporal-spectral-spatial modeling and are trained from end-to-end using a joint loss function. The novelty of this study lies in two folds. Firstly, a time domain directional feature conditioned on the direction of the target speaker is proposed, which can be jointly optimized within the time domain architecture to enhance target signal estimation. Secondly, an all-neural beamforming network in time domain is designed to refine the pre-separated results. This module features with parametric time-variant beamforming coefficient estimation, without explicitly following the derivation of optimal filters that may lead to an upper bound. The proposed method is evaluated on simulated reverberant overlapped speech data derived from the AISHELL-1 corpus. Experimental results demonstrate significant performance improvements over frequency domain state-of-the-arts, ideal magnitude masks and existing time domain neural beamforming methods.

READ FULL TEXT
research
08/30/2023

Dual-path Transformer Based Neural Beamformer for Target Speech Extraction

Neural beamformers, which integrate both pre-separation and beamforming ...
research
12/07/2021

A Time-domain Generalized Wiener Filter for Multi-channel Speech Separation

Frequency-domain neural beamformers are the mainstream methods for recen...
research
03/17/2020

Deep Attention Fusion Feature for Speech Separation with End-to-End Post-filter Method

In this paper, we propose an end-to-end post-filter method with deep att...
research
09/20/2018

TasNet: Surpassing Ideal Time-Frequency Masking for Speech Separation

Robust speech processing in multitalker acoustic environments requires a...
research
03/24/2021

Blind Speech Separation and Dereverberation using Neural Beamforming

In this paper, we present the Blind Speech Separation and Dereverberatio...
research
06/28/2023

Enhanced Neural Beamformer with Spatial Information for Target Speech Extraction

Recently, deep learning-based beamforming algorithms have shown promisin...
research
11/20/2019

Demystifying TasNet: A Dissecting Approach

In recent years time domain speech separation has excelled over frequenc...

Please sign up or login with your details

Forgot password? Click here to reset