Wavesplit: End-to-End Speech Separation by Speaker Clustering

02/20/2020
by   Neil Zeghidour, et al.
37

We introduce Wavesplit, an end-to-end speech separation system. From a single recording of mixed speech, the model infers and clusters representations of each speaker and then estimates each source signal conditioned on the inferred representations. The model is trained on the raw waveform to jointly perform the two tasks. Our model infers a set of speaker representations through clustering, which addresses the fundamental permutation problem of speech separation. Moreover, the sequence-wide speaker representations provide a more robust separation of long, challenging sequences, compared to previous approaches. We show that Wavesplit outperforms the previous state-of-the-art on clean mixtures of 2 or 3 speakers (WSJ0-2mix, WSJ0-3mix), as well as in noisy (WHAM!) and reverberated (WHAMR!) conditions. As an additional contribution, we further improve our model by introducing online data augmentation for separation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/17/2020

Continuous Speech Separation Using Speaker Inventory for Long Multi-talker Recording

Leveraging additional speaker information to facilitate speech separatio...
research
04/08/2019

Parrotron: An End-to-End Speech-to-Speech Conversion Model and its Applications to Hearing-Impaired Speech and Speech Separation

We describe Parrotron, an end-to-end-trained speech-to-speech conversion...
research
07/29/2023

Monaural Multi-Speaker Speech Separation Using Efficient Transformer Model

Cocktail party problem is the scenario where it is difficult to separate...
research
07/07/2016

Single-Channel Multi-Speaker Separation using Deep Clustering

Deep clustering is a recently introduced deep learning architecture that...
research
10/23/2019

Filterbank design for end-to-end speech separation

Single-channel speech separation has recently made great progress thanks...
research
03/13/2020

End-to-end Recurrent Denoising Autoencoder Embeddings for Speaker Identification

Speech 'in-the-wild' is a handicap for speaker recognition systems due t...
research
05/18/2023

Speech Separation based on Contrastive Learning and Deep Modularization

The current monaural state of the art tools for speech separation relies...

Please sign up or login with your details

Forgot password? Click here to reset