DeepAI AI Chat
Log In Sign Up

Complex spectrogram enhancement by convolutional neural network with multi-metrics learning

by   Szu-Wei Fu, et al.

This paper aims to address two issues existing in the current speech enhancement methods: 1) the difficulty of phase estimations; 2) a single objective function cannot consider multiple metrics simultaneously. To solve the first problem, we propose a novel convolutional neural network (CNN) model for complex spectrogram enhancement, namely estimating clean real and imaginary (RI) spectrograms from noisy ones. The reconstructed RI spectrograms are directly used to synthesize enhanced speech waveforms. In addition, since log-power spectrogram (LPS) can be represented as a function of RI spectrograms, its reconstruction is also considered as another target. Thus a unified objective function, which combines these two targets (reconstruction of RI spectrograms and LPS), is equivalent to simultaneously optimizing two commonly used objective metrics: segmental signal-to-noise ratio (SSNR) and logspectral distortion (LSD). Therefore, the learning process is called multi-metrics learning (MML). Experimental results confirm the effectiveness of the proposed CNN with RI spectrograms and MML in terms of improved standardized evaluation metrics on a speech enhancement task.


A Deep Learning Loss Function based on Auditory Power Compression for Speech Enhancement

Deep learning technology has been widely applied to speech enhancement. ...

Consistency-aware multi-channel speech enhancement using deep neural networks

This paper proposes a deep neural network (DNN)-based multi-channel spee...

Monaural source enhancement maximizing source-to-distortion ratio via automatic differentiation

Recently, deep neural network (DNN) has made a breakthrough in monaural ...

Phase-aware Speech Enhancement with Deep Complex U-Net

Most deep learning-based models for speech enhancement have mainly focus...

Convolutional Neural Network-based Speech Enhancement for Cochlear Implant Recipients

Attempts to develop speech enhancement algorithms with improved speech i...

Using recurrences in time and frequency within U-net architecture for speech enhancement

When designing fully-convolutional neural network, there is a trade-off ...