Phasebook and Friends: Leveraging Discrete Representations for Source Separation

10/02/2018
by   Jonathan Le Roux, et al.
0

Deep learning based speech enhancement and source separation systems have recently reached unprecedented levels of quality, to the point that performance is reaching a new ceiling. Most systems rely on estimating the magnitude of a target source by estimating a real-valued mask to be applied to a time-frequency representation of the mixture signal. A limiting factor in such approaches is a lack of phase estimation: the phase of the mixture is most often used when reconstructing the estimated time-domain signal. Here, we propose `MagBook', `phasebook', and `Combook', three new types of layers based on discrete representations that can be used to estimate complex time-frequency masks. MagBook layers extend classical sigmoidal units and a recently introduced convex softmax activation for mask-based magnitude estimation. Phasebook layers use a similar structure to give an estimate of the phase mask without suffering from phase wrapping issues. Combook layers are an alternative to the MagBook-Phasebook combination that directly estimate complex masks. We present various training and inference regimes involving these representations, and explain in particular how to include them in an end-to-end learning framework. We also present an oracle study to assess upper bounds on performance for various types of masks using discrete phase representations. We evaluate the proposed methods on the wsj0-2mix dataset, a well-studied corpus for single-channel speaker-independent speaker separation, matching the performance of state-of-the-art mask-based approaches without requiring additional phase reconstruction steps.

READ FULL TEXT

page 1

page 8

research
03/23/2021

Learned complex masks for multi-instrument source separation

Music source separation in the time-frequency domain is commonly achieve...
research
06/15/2022

On the Use of Deep Mask Estimation Module for Neural Source Separation Systems

Most of the recent neural source separation systems rely on a masking-ba...
research
04/26/2018

End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction

This paper proposes an end-to-end approach for single-channel speaker-in...
research
11/22/2018

Deep Learning Based Phase Reconstruction for Speaker Separation: A Trigonometric Perspective

This study investigates phase reconstruction for deep learning based mon...
research
04/17/2019

Deep Filtering: Signal Extraction Using Complex Time-Frequency Filters

Signal extraction from a single-channel mixture with additional undesire...
research
11/22/2022

SkipConvGAN: Monaural Speech Dereverberation using Generative Adversarial Networks via Complex Time-Frequency Masking

With the advancements in deep learning approaches, the performance of sp...
research
10/23/2019

Filterbank design for end-to-end speech separation

Single-channel speech separation has recently made great progress thanks...

Please sign up or login with your details

Forgot password? Click here to reset