A comprehensive study of speech separation: spectrogram vs waveform separation

05/17/2019
by   Fahimeh Bahmaninezhad, et al.
0

Speech separation has been studied widely for single-channel close-talk recordings over the past few years; developed solutions are mostly in frequency-domain. Recently, a raw audio waveform separation network (TasNet) introduced for single-channel data, with achieving high Si-SNR (scale-invariant source-to-noise ratio) and SDR (source-to-distortion ratio) comparing against the state-of-the-art solution in frequency-domain. In this study, we incorporate effective components of TasNet into a frequency-domain separation method. We compare both for alternative scenarios. We introduce a solution for directly optimizing the separation criterion in frequency-domain networks. In addition to speech separation objective and subjective measurements, we evaluate the separation performance on a speech recognition task as well. We study the speech separation problem for far-filed data (more similar to naturalistic audio streams) and develop multi-channel solutions for both frequency and time-domain separators with utilizing spectral, spatial and speaker location information. For our experiments, we simulated multi-channel spatialized reverberate WSJ0-2mix dataset. Our experimental results show that spectrogram separation can achieve competitive performance with better network design. With multi-channel framework as well, we can obtain relatively up to +35.5

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/17/2019

A Unified Framework for Speech Separation

Speech separation refers to extracting each individual speech source in ...
research
11/20/2019

Demystifying TasNet: A Dissecting Approach

In recent years time domain speech separation has excelled over frequenc...
research
02/21/2023

DasFormer: Deep Alternating Spectrogram Transformer for Multi/Single-Channel Speech Separation

For the task of speech separation, previous study usually treats multi-c...
research
12/30/2021

Feature extraction with mel scale separation method on noise audio recordings

This paper focuses on improving the accuracy of noise audio recordings. ...
research
06/05/2021

Lightweight Dual-channel Target Speaker Separation for Mobile Voice Communication

Nowadays, there is a strong need to deploy the target speaker separation...
research
05/23/2020

Efficient Integration of Multi-channel Information for Speaker-independent Speech Separation

Although deep-learning-based methods have markedly improved the performa...
research
10/27/2022

CasNet: Investigating Channel Robustness for Speech Separation

Recording channel mismatch between training and testing conditions has b...

Please sign up or login with your details

Forgot password? Click here to reset