Divide and Conquer: A Deep CASA Approach to Talker-independent Monaural Speaker Separation

04/25/2019
by   Yuzhou Liu, et al.
0

We address talker-independent monaural speaker separation from the perspectives of deep learning and computational auditory scene analysis (CASA). Specifically, we decompose the multi-speaker separation task into the stages of simultaneous grouping and sequential grouping. Simultaneous grouping is first performed in each time frame by separating the spectra of different speakers with a permutation-invariantly trained neural network. In the second stage, the frame-level separated spectra are sequentially grouped to different speakers by a clustering network. The proposed deep CASA approach optimizes frame-level separation and speaker tracking in turn, and produces excellent results for both objectives. Experimental results on the benchmark WSJ0-2mix database show that the new approach achieves the state-of-the-art results with a modest model size.

READ FULL TEXT

page 1

page 7

page 9

research
07/14/2021

Localization Based Sequential Grouping for Continuous Speech Separation

This study investigates robust speaker localization for con-tinuous spee...
research
10/08/2021

Location-based training for multi-channel talker-independent speaker separation

Permutation-invariant training (PIT) is a dominant approach for addressi...
research
03/13/2023

Online Binaural Speech Separation of Moving Speakers With a Wavesplit Network

Binaural speech separation in real-world scenarios often involves moving...
research
07/23/2019

Discriminative Learning for Monaural Speech Separation Using Deep Embedding Features

Deep clustering (DC) and utterance-level permutation invariant training ...
research
03/17/2020

High-Resolution Speaker Counting In Reverberant Rooms Using CRNN With Ambisonics Features

Speaker counting is the task of estimating the number of people that are...
research
09/02/2020

SAGRNN: Self-Attentive Gated RNN for Binaural Speaker Separation with Interaural Cue Preservation

Most existing deep learning based binaural speaker separation systems fo...
research
01/06/2021

Multichannel CRNN for Speaker Counting: an Analysis of Performance

Speaker counting is the task of estimating the number of people that are...

Please sign up or login with your details

Forgot password? Click here to reset