Continuous Speech Separation Using Speaker Inventory for Long Multi-talker Recording

12/17/2020
by   Cong Han, et al.
0

Leveraging additional speaker information to facilitate speech separation has received increasing attention in recent years. Recent research includes extracting target speech by using the target speaker's voice snippet and jointly separating all participating speakers by using a pool of additional speaker signals, which is known as speech separation using speaker inventory (SSUSI). However, all these systems ideally assume that the pre-enrolled speaker signals are available and are only evaluated on simple data configurations. In realistic multi-talker conversations, the speech signal contains a large proportion of non-overlapped regions, where we can derive robust speaker embedding of individual talkers. In this work, we adopt the SSUSI model in long recordings and propose a self-informed, clustering-based inventory forming scheme for long recording, where the speaker inventory is fully built from the input signal without the need for external speaker signals. Experiment results on simulated noisy reverberant long recording datasets show that the proposed method can significantly improve the separation performance across various conditions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/11/2018

VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking

In this paper, we present a novel system that separates the voice of a t...
research
02/20/2020

Wavesplit: End-to-End Speech Separation by Speaker Clustering

We introduce Wavesplit, an end-to-end speech separation system. From a s...
research
06/28/2021

Sparsely Overlapped Speech Training in the Time Domain: Joint Learning of Target Speech Separation and Personal VAD Benefits

Target speech separation is the process of filtering a certain speaker's...
research
06/25/2020

Speaker-Conditional Chain Model for Speech Separation and Extraction

Speech separation has been extensively explored to tackle the cocktail p...
research
02/18/2016

EEG-informed attended speaker extraction from recorded speech mixtures with application in neuro-steered hearing prostheses

OBJECTIVE: We aim to extract and denoise the attended speaker in a noisy...
research
11/02/2020

Speaker anonymisation using the McAdams coefficient

Anonymisation has the goal of manipulating speech signals in order to de...
research
07/01/2022

Speaker Diarization and Identification from Single-Channel Classroom Audio Recording Using Virtual Microphones

Speaker identification in noisy audio recordings, specifically those fro...

Please sign up or login with your details

Forgot password? Click here to reset