Multi-Dimensional and Multi-Scale Modeling for Speech Separation Optimized by Discriminative Learning

03/07/2023
by   Zhaoxi Mu, et al.
0

Transformer has shown advanced performance in speech separation, benefiting from its ability to capture global features. However, capturing local features and channel information of audio sequences in speech separation is equally important. In this paper, we present a novel approach named Intra-SE-Conformer and Inter-Transformer (ISCIT) for speech separation. Specifically, we design a new network SE-Conformer that can model audio sequences in multiple dimensions and scales, and apply it to the dual-path speech separation framework. Furthermore, we propose Multi-Block Feature Aggregation to improve the separation effect by selectively utilizing information from the intermediate blocks of the separation network. Meanwhile, we propose a speaker similarity discriminative loss to optimize the speech separation model to address the problem of poor performance when speakers have similar voices. Experimental results on the benchmark datasets WSJ0-2mix and WHAM! show that ISCIT can achieve state-of-the-art results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/14/2022

Multi-Scale Feature Fusion Transformer Network for End-to-End Single Channel Speech Separation

Recently studies on time-domain audio separation networks (TasNets) have...
research
03/07/2023

A Multi-Stage Triple-Path Method for Speech Separation in Noisy and Reverberant Environments

In noisy and reverberant environments, the performance of deep learning-...
research
07/29/2023

Monaural Multi-Speaker Speech Separation Using Efficient Transformer Model

Cocktail party problem is the scenario where it is difficult to separate...
research
03/25/2022

Embedding Recurrent Layers with Dual-Path Strategy in a Variant of Convolutional Network for Speaker-Independent Speech Separation

Speaker-independent speech separation has achieved remarkable performanc...
research
10/23/2020

Don't shoot butterfly with rifles: Multi-channel Continuous Speech Separation with Early Exit Transformer

With its strong modeling capacity that comes from a multi-head and multi...
research
06/03/2016

Automatic Separation of Compound Figures in Scientific Articles

Content-based analysis and retrieval of digital images found in scientif...
research
02/21/2023

DasFormer: Deep Alternating Spectrogram Transformer for Multi/Single-Channel Speech Separation

For the task of speech separation, previous study usually treats multi-c...

Please sign up or login with your details

Forgot password? Click here to reset