Recursive speech separation for unknown number of speakers

04/05/2019
by   Naoya Takahashi, et al.
0

In this paper we propose a method of single-channel speaker-independent multi-speaker speech separation for an unknown number of speakers. As opposed to previous works, in which the number of speakers is assumed to be known in advance and speech separation models are specific for the number of speakers, our proposed method can be applied to cases with different numbers of speakers using a single model by recursively separating a speaker. To make the separation model recursively applicable, we propose one-and-rest permutation invariant training (OR-PIT). Evaluation on WSJ0-2mix and WSJ0-3mix datasets show that our proposed method achieves state-of-the-art results for two- and three-speaker mixtures with a single model. Moreover, the same model can separate four-speaker mixture, which was never seen during the training. We further propose the detection of the number of speakers in a mixture during recursive separation and show that this approach can more accurately estimate the number of speakers than detection in advance by using a deep neural network based classifier.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

research
03/30/2022

Coarse-to-Fine Recursive Speech Separation for Unknown Number of Speakers

The vast majority of speech separation methods assume that the number of...
research
10/08/2021

Location-based training for multi-channel talker-independent speaker separation

Permutation-invariant training (PIT) is a dominant approach for addressi...
research
04/18/2021

Many-Speakers Single Channel Speech Separation with Optimal Permutation Training

Single channel speech separation has experienced great progress in the l...
research
05/31/2023

UNSSOR: Unsupervised Neural Speech Separation by Leveraging Over-determined Training Mixtures

In reverberant conditions with multiple concurrent speakers, each microp...
research
05/26/2019

Auditory Separation of a Conversation from Background via Attentional Gating

We present a model for separating a set of voices out of a sound mixture...
research
08/26/2020

FCN Approach for Dynamically Locating Multiple Speakers

In this paper, we present a deep neural network-based online multi-speak...
research
11/04/2020

Single channel voice separation for unknown number of speakers under reverberant and noisy settings

We present a unified network for voice separation of an unknown number o...

Please sign up or login with your details

Forgot password? Click here to reset