Speaker and Direction Inferred Dual-channel Speech Separation

02/08/2021
by   Chenxing Li, et al.
0

Most speech separation methods, trying to separate all channel sources simultaneously, are still far from having enough general- ization capabilities for real scenarios where the number of input sounds is usually uncertain and even dynamic. In this work, we employ ideas from auditory attention with two ears and propose a speaker and direction inferred speech separation network (dubbed SDNet) to solve the cocktail party problem. Specifically, our SDNet first parses out the respective perceptual representations with their speaker and direction characteristics from the mixture of the scene in a sequential manner. Then, the perceptual representations are utilized to attend to each corresponding speech. Our model gener- ates more precise perceptual representations with the help of spatial features and successfully deals with the problem of the unknown number of sources and the selection of outputs. The experiments on standard fully-overlapped speech separation benchmarks, WSJ0- 2mix, WSJ0-3mix, and WSJ0-2 3mix, show the effectiveness, and our method achieves SDR improvements of 25.31 dB, 17.26 dB, and 21.56 dB under anechoic settings. Our codes will be released at https://github.com/aispeech-lab/SDNet.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/25/2020

Speaker-Conditional Chain Model for Speech Separation and Extraction

Speech separation has been extensively explored to tackle the cocktail p...
research
02/10/2022

Royalflush Speaker Diarization System for ICASSP 2022 Multi-channel Multi-party Meeting Transcription Challenge

This paper describes the Royalflush speaker diarization system submitted...
research
03/29/2018

Cracking the cocktail party problem by multi-beam deep attractor network

While recent progresses in neural network approaches to single-channel s...
research
07/01/2020

Exploring the time-domain deep attractor network with two-stream architectures in a reverberant environment

With the success of deep learning in speech signal processing, speaker-i...
research
06/28/2021

Sparsely Overlapped Speech Training in the Time Domain: Joint Learning of Target Speech Separation and Personal VAD Benefits

Target speech separation is the process of filtering a certain speaker's...
research
08/07/2023

Improving Deep Attractor Network by BGRU and GMM for Speech Separation

Deep Attractor Network (DANet) is the state-of-the-art technique in spee...
research
06/01/2021

Sparse, Efficient, and Semantic Mixture Invariant Training: Taming In-the-Wild Unsupervised Sound Separation

Supervised neural network training has led to significant progress on si...

Please sign up or login with your details

Forgot password? Click here to reset