Coarse-to-Fine Recursive Speech Separation for Unknown Number of Speakers

03/30/2022
by   Zhenhao Jin, et al.
0

The vast majority of speech separation methods assume that the number of speakers is known in advance, hence they are specific to the number of speakers. By contrast, a more realistic and challenging task is to separate a mixture in which the number of speakers is unknown. This paper formulates the speech separation with the unknown number of speakers as a multi-pass source extraction problem and proposes a coarse-to-fine recursive speech separation method. This method comprises two stages, namely, recursive cue extraction and target speaker extraction. The recursive cue extraction stage determines how many computational iterations need to be performed and outputs a coarse cue speech by monitoring statistics in the mixture. As the number of recursive iterations increases, the accumulation of distortion eventually comes into the extracted speech and reminder. Therefore, in the second stage, we use a target speaker extraction network to extract a fine speech based on the coarse target cue and the original distortionless mixture. Experiments show that the proposed method archived state-of-the-art performance on the WSJ0 dataset with a different number of speakers. Furthermore, it generalizes well to an unseen large number of speakers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/05/2019

Recursive speech separation for unknown number of speakers

In this paper we propose a method of single-channel speaker-independent ...
research
06/29/2023

Modified Parametric Multichannel Wiener Filter for Low-latency Enhancement of Speech Mixtures with Unknown Number of Speakers

This paper introduces a novel low-latency online beamforming (BF) algori...
research
06/04/2020

Multi-talker ASR for an unknown number of sources: Joint training of source counting, separation and ASR

Most approaches to multi-talker overlapped speech separation and recogni...
research
03/27/2020

Separating Varying Numbers of Sources with Auxiliary Autoencoding Loss

Many recent source separation systems are designed to separate a fixed n...
research
05/24/2022

SepIt: Approaching a Single Channel Speech Separation Bound

We present an upper bound for the Single Channel Speech Separation task,...
research
10/12/2020

The Cone of Silence: Speech Separation by Localization

Given a multi-microphone recording of an unknown number of speakers talk...
research
01/22/2019

Speech Separation Using Gain-Adapted Factorial Hidden Markov Models

We present a new probabilistic graphical model which generalizes factori...

Please sign up or login with your details

Forgot password? Click here to reset