On End-to-end Multi-channel Time Domain Speech Separation in Reverberant Environments

11/11/2020
by   Jisi Zhang, et al.
0

This paper introduces a new method for multi-channel time domain speech separation in reverberant environments. A fully-convolutional neural network structure has been used to directly separate speech from multiple microphone recordings, with no need of conventional spatial feature extraction. To reduce the influence of reverberation on spatial feature extraction, a dereverberation pre-processing method has been applied to further improve the separation performance. A spatialized version of wsj0-2mix dataset has been simulated to evaluate the proposed system. Both source separation and speech recognition performance of the separated signals have been evaluated objectively. Experiments show that the proposed fully-convolutional network improves the source separation metric and the word error rate (WER) by more than 13 relative, respectively, over a reference system with conventional features. Applying dereverberation as pre-processing to the proposed system can further reduce the WER by 29 reverberated data.

READ FULL TEXT
research
12/18/2019

End-to-end training of time domain audio separation and recognition

The rising interest in single-channel multi-speaker speech separation sp...
research
03/17/2020

Deep Attention Fusion Feature for Speech Separation with End-to-End Post-filter Method

In this paper, we propose an end-to-end post-filter method with deep att...
research
03/31/2022

Perceptive, non-linear Speech Processing and Spiking Neural Networks

Source separation and speech recognition are very difficult in the conte...
research
03/02/2018

Raw Multi-Channel Audio Source Separation using Multi-Resolution Convolutional Auto-Encoders

Supervised multi-channel audio source separation requires extracting use...
research
04/08/2019

Audio Source Separation via Multi-Scale Learning with Dilated Dense U-Nets

Modern audio source separation techniques rely on optimizing sequence mo...
research
11/17/2020

Implicit Filter-and-sum Network for Multi-channel Speech Separation

Various neural network architectures have been proposed in recent years ...
research
12/02/2020

Improved MVDR Beamforming Using LSTM Speech Models to Clean Spatial Clustering Masks

Spatial clustering techniques can achieve significant multi-channel nois...

Please sign up or login with your details

Forgot password? Click here to reset