Spatial and spectral deep attention fusion for multi-channel speech separation using deep embedding features

02/05/2020
by   Cunhang Fan, et al.
0

Multi-channel deep clustering (MDC) has acquired a good performance for speech separation. However, MDC only applies the spatial features as the additional information. So it is difficult to learn mutual relationship between spatial and spectral features. Besides, the training objective of MDC is defined at embedding vectors, rather than real separated sources, which may damage the separation performance. In this work, we propose a deep attention fusion method to dynamically control the weights of the spectral and spatial features and combine them deeply. In addition, to solve the training objective problem of MDC, the real separated sources are used as the training objectives. Specifically, we apply the deep clustering network to extract deep embedding features. Instead of using the unsupervised K-means clustering to estimate binary masks, another supervised network is utilized to learn soft masks from these deep embedding features. Our experiments are conducted on a spatialized reverberant version of WSJ0-2mix dataset. Experimental results show that the proposed method outperforms MDC baseline and even better than the oracle ideal binary mask (IBM).

READ FULL TEXT
research
04/06/2020

Simultaneous Denoising and Dereverberation Using Deep Embedding Features

Monaural speech dereverberation is a very challenging task because no sp...
research
10/24/2019

Multi-channel Speech Separation Using Deep Embedding Model with Multilayer Bootstrap Networks

Recently, deep clustering (DPCL) based speaker-independent speech separa...
research
03/17/2020

Deep Attention Fusion Feature for Speech Separation with End-to-End Post-filter Method

In this paper, we propose an end-to-end post-filter method with deep att...
research
07/23/2019

Discriminative Learning for Monaural Speech Separation Using Deep Embedding Features

Deep clustering (DC) and utterance-level permutation invariant training ...
research
01/15/2019

Orthonormal Embedding-based Deep Clustering for Single-channel Speech Separation

Deep clustering is a deep neural network-based speech separation algorit...
research
06/09/2023

An Efficient Speech Separation Network Based on Recurrent Fusion Dilated Convolution and Channel Attention

We present an efficient speech separation neural network, ARFDCN, which ...
research
03/14/2023

Multi-Channel Masking with Learnable Filterbank for Sound Source Separation

This work proposes a learnable filterbank based on a multi-channel maski...

Please sign up or login with your details

Forgot password? Click here to reset