Phase-Aware Deep Speech Enhancement: It's All About The Frame Length

03/30/2022
by   Tal Peer, et al.
0

While phase-aware speech processing has been receiving increasing attention in recent years, most narrowband STFT approaches with frame lengths of about 32ms show a rather modest impact of phase on overall performance. At the same time, modern deep neural network (DNN)-based approaches, like Conv-TasNet, that implicitly modify both magnitude and phase yield great performance on very short frames (2ms). Motivated by this observation, in this paper we systematically investigate the role of phase and magnitude in DNN-based speech enhancement for different frame lengths. The results show that a phase-aware DNN can take advantage of what previous studies concerning reconstruction of clean speech have shown: When using short frames, the phase spectrum becomes more important while the importance of the magnitude spectrum decreases. Furthermore, our experiments show that when both magnitude and phase are estimated, shorter frames result in a considerably improved performance in a DNN with explicit phase estimation. Contrarily, in the phase-blind case, where only magnitudes are processed, 32ms frames lead to the best performance. We conclude that DNN-based phase estimation benefits from the use of shorter frames and recommend a frame length of about 4ms for future phase-aware deep speech enhancement methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/24/2022

Phase Continuity: Learning Derivatives of Phase Spectrum for Speech Enhancement

Modern neural speech enhancement models usually include various forms of...
research
03/03/2018

Speech Enhancement Based on Non-stationary Noise-driven Geometric Spectral Subtraction and Phase Spectrum Compensation

In this paper, a speech enhancement method based on noise compensation p...
research
04/15/2022

Improving Frame-Online Neural Speech Enhancement with Overlapped-Frame Prediction

Frame-online speech enhancement systems in the short-time Fourier transf...
research
02/10/2022

Auditory Model based Phase-Aware Bayesian Spectral Amplitude Estimator for Single-Channel Speech Enhancement

Bayesian estimation of short-time spectral amplitude is one of the most ...
research
01/02/2020

Phase-based Information for Voice Pathology Detection

In most current approaches of speech processing, information is extracte...
research
06/23/2022

Efficient Transformer-based Speech Enhancement Using Long Frames and STFT Magnitudes

The SepFormer architecture shows very good results in speech separation....
research
11/12/2019

PHASEN: A Phase-and-Harmonics-Aware Speech Enhancement Network

Time-frequency (T-F) domain masking is a mainstream approach for single-...

Please sign up or login with your details

Forgot password? Click here to reset