Complex spectrogram enhancement by convolutional neural network with multi-metrics learning

04/27/2017
by   Szu-Wei Fu, et al.
0

This paper aims to address two issues existing in the current speech enhancement methods: 1) the difficulty of phase estimations; 2) a single objective function cannot consider multiple metrics simultaneously. To solve the first problem, we propose a novel convolutional neural network (CNN) model for complex spectrogram enhancement, namely estimating clean real and imaginary (RI) spectrograms from noisy ones. The reconstructed RI spectrograms are directly used to synthesize enhanced speech waveforms. In addition, since log-power spectrogram (LPS) can be represented as a function of RI spectrograms, its reconstruction is also considered as another target. Thus a unified objective function, which combines these two targets (reconstruction of RI spectrograms and LPS), is equivalent to simultaneously optimizing two commonly used objective metrics: segmental signal-to-noise ratio (SSNR) and logspectral distortion (LSD). Therefore, the learning process is called multi-metrics learning (MML). Experimental results confirm the effectiveness of the proposed CNN with RI spectrograms and MML in terms of improved standardized evaluation metrics on a speech enhancement task.

READ FULL TEXT
research
08/26/2021

A Deep Learning Loss Function based on Auditory Power Compression for Speech Enhancement

Deep learning technology has been widely applied to speech enhancement. ...
research
09/12/2017

End-to-End Waveform Utterance Enhancement for Direct Evaluation Metrics Optimization by Fully Convolutional Neural Networks

Speech enhancement model is used to map a noisy speech to a clean speech...
research
02/14/2020

Consistency-aware multi-channel speech enhancement using deep neural networks

This paper proposes a deep neural network (DNN)-based multi-channel spee...
research
06/15/2018

Monaural source enhancement maximizing source-to-distortion ratio via automatic differentiation

Recently, deep neural network (DNN) has made a breakthrough in monaural ...
research
04/08/2021

MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement

The discrepancy between the cost function used for training a speech enh...
research
07/03/2019

Convolutional Neural Network-based Speech Enhancement for Cochlear Implant Recipients

Attempts to develop speech enhancement algorithms with improved speech i...
research
11/16/2018

Using recurrences in time and frequency within U-net architecture for speech enhancement

When designing fully-convolutional neural network, there is a trade-off ...

Please sign up or login with your details

Forgot password? Click here to reset