WaveCRN: An Efficient Convolutional Recurrent Neural Network for End-to-end Speech Enhancement

by   Tsun-An Hsieh, et al.

Due to the simple design pipeline, end-to-end (E2E) neural models for speech enhancement (SE) have attracted great interest. In order to improve the performance of the E2E model, the locality and temporal sequential properties of speech should be efficiently taken into account when modelling. However, in most current E2E models for SE, these properties are either not fully considered, or are too complex to be realized. In this paper, we propose an efficient E2E SE model, termed WaveCRN. In WaveCRN, the speech locality feature is captured by a convolutional neural network (CNN), while the temporal sequential property of the locality feature is modeled by stacked simple recurrent units (SRU). Unlike a conventional temporal sequential model that uses a long short-term memory (LSTM) network, which is difficult to parallelize, SRU can be efficiently parallelized in calculation with even fewer model parameters. In addition, in order to more effectively suppress the noise components in the input noisy speech, we derive a novel restricted feature masking (RFM) approach that performs enhancement on the embedded features in the hidden layers instead of on the physical spectral features commonly used in speech separation tasks. Experimental results on speech denoising and compressed speech restoration tasks confirm that with the lightweight architecture of SRU and the feature-mapping-based RFM, WaveCRN performs comparably with other state-of-the-art approaches with notably reduced model complexity and inference time.


Single Channel Speech Enhancement Using Temporal Convolutional Recurrent Neural Networks

In recent decades, neural network based methods have significantly impro...

Tensor-Train Long Short-Term Memory for Monaural Speech Enhancement

In recent years, Long Short-Term Memory (LSTM) has become a popular choi...

RHR-Net: A Residual Hourglass Recurrent Neural Network for Speech Enhancement

Most current speech enhancement models use spectrogram features that req...

Attention-based multi-task learning for speech-enhancement and speaker-identification in multi-speaker dialogue scenario

Multi-task learning (MTL) and attention mechanism have been proven to ef...

Know Your Enemy, Know Yourself: A Unified Two-Stage Framework for Speech Enhancement

Traditional spectral subtraction-type single channel speech enhancement ...

A fully recurrent feature extraction for single channel speech enhancement

Convolutional neural network (CNN) modules are widely being used to buil...

Increasing Compactness Of Deep Learning Based Speech Enhancement Models With Parameter Pruning And Quantization Techniques

Most recent studies on deep learning based speech enhancement (SE) focus...

Please sign up or login with your details

Forgot password? Click here to reset