Enhanced Neural Beamformer with Spatial Information for Target Speech Extraction

06/28/2023
by   Aoqi Guo, et al.
0

Recently, deep learning-based beamforming algorithms have shown promising performance in target speech extraction tasks. However, most systems do not fully utilize spatial information. In this paper, we propose a target speech extraction network that utilizes spatial information to enhance the performance of neural beamformer. To achieve this, we first use the UNet-TCN structure to model input features and improve the estimation accuracy of the speech pre-separation module by avoiding information loss caused by direct dimensionality reduction in other models. Furthermore, we introduce a multi-head cross-attention mechanism that enhances the neural beamformer's perception of spatial information by making full use of the spatial information received by the array. Experimental results demonstrate that our approach, which incorporates a more reasonable target mask estimation network and a spatial information-based cross-attention mechanism into the neural beamformer, effectively improves speech separation performance.

READ FULL TEXT

page 5

page 6

research
08/30/2023

Dual-path Transformer Based Neural Beamformer for Target Speech Extraction

Neural beamformers, which integrate both pre-separation and beamforming ...
research
11/05/2019

Spatial Attention for Far-field Speech Recognition with Deep Beamforming Neural Networks

In this paper, we introduce spatial attention for refining the informati...
research
10/13/2021

All-neural beamformer for continuous speech separation

Continuous speech separation (CSS) aims to separate overlapping voices f...
research
08/19/2023

Spatial Reconstructed Local Attention Res2Net with F0 Subband for Fake Speech Detection

The rhythm of synthetic speech is usually too smooth, which causes that ...
research
12/16/2022

Towards Unified All-Neural Beamforming for Time and Frequency Domain Speech Separation

Recently, frequency domain all-neural beamforming methods have achieved ...
research
04/17/2021

MIMO Self-attentive RNN Beamformer for Multi-speaker Speech Separation

Recently, our proposed recurrent neural network (RNN) based all deep lea...
research
07/18/2023

Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis

This paper presents ER-NeRF, a novel conditional Neural Radiance Fields ...

Please sign up or login with your details

Forgot password? Click here to reset