A Multi-Phase Gammatone Filterbank for Speech Separation via TasNet

10/25/2019
by   David Ditter, et al.
0

In this work, we investigate if the learned encoder of the end-to-end convolutional time domain audio separation network (Conv-TasNet) is the key to its recent success, or if the encoder can just as well be replaced by a deterministic hand-crafted filterbank. Motivated by the resemblance of the trained encoder of Conv-TasNet to auditory filterbanks, we propose to employ a deterministic gammatone filterbank. In contrast to a common gammatone filterbank, our filters are restricted to 2 ms length to allow for low-latency processing. Inspired by the encoder learned by Conv-TasNet, in addition to the logarithmically spaced filters, the proposed filterbank holds multiple gammatone filters at the same center frequency with varying phase shifts. We show that replacing the learned encoder with our proposed multi-phase gammatone filterbank (MP-GTF) even leads to a scale-invariant source-to-noise ratio (SI-SNR) improvement of 0.8 dB. Furthermore, in contrast to using the learned encoder we show that the number of filters can be reduced from 512 to 128 without loss of performance.

READ FULL TEXT
research
03/09/2020

Enhancing End-to-End Multi-channel Speech Separation via Spatial Feature Learning

Hand-crafted spatial features (e.g., inter-channel phase difference, IPD...
research
09/29/2019

FaSNet: Low-latency Adaptive Beamforming for Multi-microphone Audio Processing

Beamforming has been extensively investigated for multi-channel audio pr...
research
01/26/2022

SkiM: Skipping Memory LSTM for Low-Latency Real-Time Continuous Speech Separation

Continuous speech separation for meeting pre-processing has recently bec...
research
11/29/2020

A comparison of handcrafted, parameterized, and learnable features for speech separation

The design of acoustic features is important for speech separation. It c...
research
04/12/2022

Low Latency Time Domain Multichannel Speech and Music Source Separation

The Goal is to obtain a simple multichannel source separation with very ...
research
06/20/2022

An Empirical Analysis on the Vulnerabilities of End-to-End Speech Segregation Models

End-to-end learning models have demonstrated a remarkable capability in ...
research
10/28/2022

UX-NET: Filter-and-Process-based Improved U-Net for Real-time Time-domain Audio Separation

This study presents UX-Net, a time-domain audio separation network (TasN...

Please sign up or login with your details

Forgot password? Click here to reset