Deep Attention Fusion Feature for Speech Separation with End-to-End Post-filter Method

03/17/2020
by   Cunhang Fan, et al.
0

In this paper, we propose an end-to-end post-filter method with deep attention fusion features for monaural speaker-independent speech separation. At first, a time-frequency domain speech separation method is applied as the pre-separation stage. The aim of pre-separation stage is to separate the mixture preliminarily. Although this stage can separate the mixture, it still contains the residual interference. In order to enhance the pre-separated speech and improve the separation performance further, the end-to-end post-filter (E2EPF) with deep attention fusion features is proposed. The E2EPF can make full use of the prior knowledge of the pre-separated speech, which contributes to speech separation. It is a fully convolutional speech separation network and uses the waveform as the input features. Firstly, the 1-D convolutional layer is utilized to extract the deep representation features for the mixture and pre-separated signals in the time domain. Secondly, to pay more attention to the outputs of the pre-separation stage, an attention module is applied to acquire deep attention fusion features, which are extracted by computing the similarity between the mixture and the pre-separated speech. These deep attention fusion features are conducive to reduce the interference and enhance the pre-separated speech. Finally, these features are sent to the post-filter to estimate each target signals. Experimental results on the WSJ0-2mix dataset show that the proposed method outperforms the state-of-the-art speech separation method. Compared with the pre-separation method, our proposed method can acquire 64.1 improvements in scale-invariant source-to-noise ratio (SI-SNR), the signal-to-distortion ratio (SDR), the perceptual evaluation of speech quality (PESQ) and the short-time objective intelligibility (STOI) measures, respectively.

READ FULL TEXT

page 1

page 8

page 12

research
11/11/2020

On End-to-end Multi-channel Time Domain Speech Separation in Reverberant Environments

This paper introduces a new method for multi-channel time domain speech ...
research
03/31/2022

EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers

In this paper, we present a novel framework that jointly performs speake...
research
10/20/2019

Deep speech inpainting of time-frequency masks

In particularly noisy environments, transient loud intrusions can comple...
research
12/16/2022

Towards Unified All-Neural Beamforming for Time and Frequency Domain Speech Separation

Recently, frequency domain all-neural beamforming methods have achieved ...
research
02/05/2020

Spatial and spectral deep attention fusion for multi-channel speech separation using deep embedding features

Multi-channel deep clustering (MDC) has acquired a good performance for ...
research
12/17/2021

Discretization and Re-synthesis: an alternative method to solve the Cocktail Party Problem

Deep learning based models have significantly improved the performance o...
research
07/23/2021

Multi-channel Speech Enhancement with 2-D Convolutional Time-frequency Domain Features and a Pre-trained Acoustic Model

We propose a multi-channel speech enhancement approach with a novel two-...

Please sign up or login with your details

Forgot password? Click here to reset