Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement

08/17/2023
by   Ye-Xin Lu, et al.
0

Phase information has a significant impact on speech perceptual quality and intelligibility. However, existing speech enhancement methods encounter limitations in explicit phase estimation due to the non-structural nature and wrapping characteristics of the phase, leading to a bottleneck in enhanced speech quality. To overcome the above issue, in this paper, we proposed MP-SENet, a novel Speech Enhancement Network which explicitly enhances Magnitude and Phase spectra in parallel. The proposed MP-SENet adopts a codec architecture in which the encoder and decoder are bridged by time-frequency Transformers along both time and frequency dimensions. The encoder aims to encode time-frequency representations derived from the input distorted magnitude and phase spectra. The decoder comprises dual-stream magnitude and phase decoders, directly enhancing magnitude and wrapped phase spectra by incorporating a magnitude estimation architecture and a phase parallel estimation architecture, respectively. To train the MP-SENet model effectively, we define multi-level loss functions, including mean square error and perceptual metric loss of magnitude spectra, anti-wrapping loss of phase spectra, as well as mean square error and consistency loss of short-time complex spectra. Experimental results demonstrate that our proposed MP-SENet excels in high-quality speech enhancement across multiple tasks, including speech denoising, dereverberation, and bandwidth extension. Compared to existing phase-aware speech enhancement methods, it successfully avoids the bidirectional compensation effect between the magnitude and phase, leading to a better harmonic restoration. Notably, for the speech denoising task, the MP-SENet yields a state-of-the-art performance with a PESQ of 3.60 on the public VoiceBank+DEMAND dataset.

READ FULL TEXT

page 1

page 4

page 10

page 11

research
02/24/2022

Phase Continuity: Learning Derivatives of Phase Spectrum for Speech Enhancement

Modern neural speech enhancement models usually include various forms of...
research
03/08/2019

A Deep Generative Model of Speech Complex Spectrograms

This paper proposes an approach to the joint modeling of the short-time ...
research
09/03/2020

Dense CNN with Self-Attention for Time-Domain Speech Enhancement

Speech enhancement in the time domain is becoming increasingly popular i...
research
01/26/2019

End-to-End Multi-Task Denoising for joint SDR and PESQ Optimization

Supervised learning based on a deep neural network recently has achieved...
research
12/03/2020

Individually amplified text-to-speech

Text-to-speech (TTS) offers the opportunity to compensate for a hearing ...
research
11/29/2022

Neural Speech Phase Prediction based on Parallel Estimation Architecture and Anti-Wrapping Losses

This paper presents a novel speech phase prediction model which predicts...
research
05/13/2023

APNet: An All-Frame-Level Neural Vocoder Incorporating Direct Prediction of Amplitude and Phase Spectra

This paper presents a novel neural vocoder named APNet which reconstruct...

Please sign up or login with your details

Forgot password? Click here to reset