Phase Aware Speech Enhancement using Realisation of Complex-valued LSTM

by   Raktim Gautam Goswami, et al.

Most of the deep learning based speech enhancement (SE) methods rely on estimating the magnitude spectrum of the clean speech signal from the observed noisy speech signal, either by magnitude spectral masking or regression. These methods reuse the noisy phase while synthesizing the time-domain waveform from the estimated magnitude spectrum. However, there have been recent works highlighting the importance of phase in SE. There was an attempt to estimate the complex ratio mask taking phase into account using complex-valued feed-forward neural network (FFNN). But FFNNs cannot capture the sequential information essential for phase estimation. In this work, we propose a realisation of complex-valued long short-term memory (RCLSTM) network to estimate the complex ratio mask (CRM) using sequential information along time. The proposed RCLSTM is designed to process the complex-valued sequences using complex arithmetic, and hence it preserves the dependencies between the real and imaginary parts of CRM and thereby the phase. The proposed method is evaluated on the noisy speech mixtures formed from the Voice-Bank corpus and DEMAND database. When compared to real value based masking methods, the proposed RCLSTM improves over them in several objective measures including perceptual evaluation of speech quality (PESQ), in which it improves by over 4.3


Phase-aware Speech Enhancement with Deep Complex U-Net

Most deep learning-based models for speech enhancement have mainly focus...

DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement

Speech enhancement has benefited from the success of deep learning in te...

Joint magnitude estimation and phase recovery using Cycle-in-Cycle GAN for non-parallel speech enhancement

For the lack of adequate paired noisy-clean speech corpus in many real s...

Real-time Monaural Speech Enhancement With Short-time Discrete Cosine Transform

Speech enhancement algorithms based on deep learning have been improved ...

Phase-aware Single-stage Speech Denoising and Dereverberation with U-Net

In this work, we tackle a denoising and dereverberation problem with a s...

A Two-stage Complex Network using Cycle-consistent Generative Adversarial Networks for Speech Enhancement

Cycle-consistent generative adversarial networks (CycleGAN) have shown t...