Online Binaural Speech Separation of Moving Speakers With a Wavesplit Network

03/13/2023
by   Cong Han, et al.
0

Binaural speech separation in real-world scenarios often involves moving speakers. Most current speech separation methods use utterance-level permutation invariant training (u-PIT) for training. In inference time, however, the order of outputs can be inconsistent over time particularly in long-form speech separation. This situation which is referred to as the speaker swap problem is even more problematic when speakers constantly move in space and therefore poses a challenge for consistent placement of speakers in output channels. Here, we describe a real-time binaural speech separation model based on a Wavesplit network to mitigate the speaker swap problem for moving speaker separation. Our model computes a speaker embedding for each speaker at each time frame from the mixed audio, aggregates embeddings using online clustering, and uses cluster centroids as speaker profiles to track each speaker throughout the long duration. Experimental results on reverberant, long-form moving multitalker speech separation show that the proposed method is less prone to speaker swap and achieves comparable performance with u-PIT based models with ground truth tracking in both separation accuracy and preserving the interaural cues.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/20/2020

Speaker Separation Using Speaker Inventories and Estimated Speech

We propose speaker separation using speaker inventories and estimated sp...
research
07/29/2023

Monaural Multi-Speaker Speech Separation Using Efficient Transformer Model

Cocktail party problem is the scenario where it is difficult to separate...
research
05/18/2023

Speech Separation based on Contrastive Learning and Deep Modularization

The current monaural state of the art tools for speech separation relies...
research
11/27/2021

Online Speaker Diarization with Graph-based Label Generation

This paper introduces an online speaker diarization system that can hand...
research
04/25/2019

Divide and Conquer: A Deep CASA Approach to Talker-independent Monaural Speaker Separation

We address talker-independent monaural speaker separation from the persp...
research
09/02/2020

SAGRNN: Self-Attentive Gated RNN for Binaural Speaker Separation with Interaural Cue Preservation

Most existing deep learning based binaural speaker separation systems fo...
research
05/14/2020

FaceFilter: Audio-visual speech separation using still images

The objective of this paper is to separate a target speaker's speech fro...

Please sign up or login with your details

Forgot password? Click here to reset