PHASEN: A Phase-and-Harmonics-Aware Speech Enhancement Network

11/12/2019
by   Dacheng Yin, et al.
0

Time-frequency (T-F) domain masking is a mainstream approach for single-channel speech enhancement. Recently, focuses have been put to phase prediction in addition to amplitude prediction. In this paper, we propose a phase-and-harmonics-aware deep neural network (DNN), named PHASEN, for this task. Unlike previous methods that directly use a complex ideal ratio mask to supervise the DNN learning, we design a two-stream network, where amplitude stream and phase stream are dedicated to amplitude and phase prediction. We discover that the two streams should communicate with each other, and this is crucial to phase prediction. In addition, we propose frequency transformation blocks to catch long-range correlations along the frequency axis. The visualization shows that the learned transformation matrix spontaneously captures the harmonic correlation, which has been proven to be helpful for T-F spectrogram reconstruction. With these two innovations, PHASEN acquires the ability to handle detailed phase patterns and to utilize harmonic patterns, getting 1.76dB SDR improvement on AVSpeech + AudioSet dataset. It also achieves significant gains over Google's network on this dataset. On Voice Bank + DEMAND dataset, PHASEN outperforms previous methods by a large margin on four metrics.

READ FULL TEXT

page 1

page 3

page 6

page 7

research
01/30/2022

HGCN: harmonic gated compensation network for speech enhancement

Mask processing in the time-frequency (T-F) domain through the neural ne...
research
02/14/2020

Consistency-aware multi-channel speech enhancement using deep neural networks

This paper proposes a deep neural network (DNN)-based multi-channel spee...
research
03/07/2019

Phase-aware Speech Enhancement with Deep Complex U-Net

Most deep learning-based models for speech enhancement have mainly focus...
research
06/09/2021

Deep Interaction between Masking and Mapping Targets for Single-Channel Speech Enhancement

The most recent deep neural network (DNN) models exhibit impressive deno...
research
03/30/2022

Phase-Aware Deep Speech Enhancement: It's All About The Frame Length

While phase-aware speech processing has been receiving increasing attent...
research
05/06/2021

Speech Enhancement using Separable Polling Attention and Global Layer Normalization followed with PReLU

Single channel speech enhancement is a challenging task in speech commun...
research
10/29/2018

Phase Harmonics and Correlation Invariants in Convolutional Neural Networks

We prove that linear rectifiers act as phase transformations on complex ...

Please sign up or login with your details

Forgot password? Click here to reset