Combining Spatial Clustering with LSTM Speech Models for Multichannel Speech Enhancement

12/02/2020
by   Felix Grezes, et al.
0

Recurrent neural networks using the LSTM architecture can achieve significant single-channel noise reduction. It is not obvious, however, how to apply them to multi-channel inputs in a way that can generalize to new microphone configurations. In contrast, spatial clustering techniques can achieve such generalization, but lack a strong signal model. This paper combines the two approaches to attain both the spatial separation performance and generality of multichannel spatial clustering and the signal modeling performance of multiple parallel single-channel LSTM speech enhancers. The system is compared to several baselines on the CHiME3 dataset in terms of speech quality predicted by the PESQ algorithm and word error rate of a recognizer trained on mis-matched conditions, in order to focus on generalization. Our experiments show that by combining the LSTM models with the spatial clustering, we reduce word error rate by 4.6% absolute (17.2% relative) on the development set and 11.2% absolute (25.5% relative) on test set compared with spatial clustering system, and reduce by 10.75% (32.72% relative) on development set and 6.12% absolute (15.76% relative) on test data compared with LSTM model.

READ FULL TEXT
research
12/02/2020

Enhancement of Spatial Clustering-Based Time-Frequency Masks using LSTM Neural Networks

Recent works have shown that Deep Recurrent Neural Networks using the LS...
research
12/02/2020

Improved MVDR Beamforming Using LSTM Speech Models to Clean Spatial Clustering Masks

Spatial clustering techniques can achieve significant multi-channel nois...
research
03/13/2019

Multi-Geometry Spatial Acoustic Modeling for Distant Speech Recognition

The use of spatial information with multiple microphones can improve far...
research
04/02/2019

Unsupervised training of a deep clustering model for multichannel blind source separation

We propose a training scheme to train neural network-based source separa...
research
09/26/2019

An Investigation into the Effectiveness of Enhancement in ASR Training and Test for CHiME-5 Dinner Party Transcription

Despite the strong modeling power of neural network acoustic models, spe...
research
10/13/2021

Comparison of SVD and factorized TDNN approaches for speech to text

This work concentrates on reducing the RTF and word error rate of a hybr...
research
10/04/2017

Combining absolute and relative pointing for fast and accurate distant interaction

Traditional relative pointing devices such as mice and trackpads are uns...

Please sign up or login with your details

Forgot password? Click here to reset