Rethinking the Separation Layers in Speech Separation Networks

11/17/2020
by   Yi Luo, et al.
0

Modules in all existing speech separation networks can be categorized into single-input-multi-output (SIMO) modules and single-input-single-output (SISO) modules. SIMO modules generate more outputs than input, and SISO modules keep the numbers of input and output the same. While the majority of separation models only contain SIMO architectures, it has also been shown that certain two-stage separation systems integrated with a post-enhancement SISO module can improve the separation quality. Why performance improvements can be achieved by incorporating the SISO modules? Are SIMO modules always necessary? In this paper, we empirically examine those questions by designing models with varying configurations in the SIMO and SISO modules. We show that comparing with the standard SIMO-only design, a mixed SIMO-SISO design with a same model size is able to improve the separation performance especially under low-overlap conditions. We further validate the necessity of SIMO modules and show that SISO-only models are still able to perform separation without sacrificing the performance. The observations allow us to rethink the model design paradigm and present different views on how the separation is performed.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/18/2019

Alternating Between Spectral and Spatial Estimation for Speech Separation and Enhancement

This work investigates alternation between spectral separation using mas...
research
04/30/2023

Drinfeld modules in SageMath

We present the first implementation of Drinfeld modules fully integrated...
research
02/19/2021

TransMask: A Compact and Fast Speech Separation Model Based on Transformer

Speech separation is an important problem in speech processing, which ta...
research
11/06/2018

Building Corpora for Single-Channel Speech Separation Across Multiple Domains

To date, the bulk of research on single-channel speech separation has be...
research
10/28/2022

UX-NET: Filter-and-Process-based Improved U-Net for Real-time Time-domain Audio Separation

This study presents UX-Net, a time-domain audio separation network (TasN...
research
03/22/2019

Nonmodular architectures of cognitive systems based on active inference

In psychology and neuroscience it is common to describe cognitive system...
research
11/17/2020

Ultra-Lightweight Speech Separation via Group Communication

Model size and complexity remain the biggest challenges in the deploymen...

Please sign up or login with your details

Forgot password? Click here to reset