Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization

05/23/2023
by   Marc Delcroix, et al.
0

Combining end-to-end neural speaker diarization (EEND) with vector clustering (VC), known as EEND-VC, has gained interest for leveraging the strengths of both methods. EEND-VC estimates activities and speaker embeddings for all speakers within an audio chunk and uses VC to associate these activities with speaker identities across different chunks. EEND-VC generates thus multiple streams of embeddings, one for each speaker in a chunk. We can cluster these embeddings using constrained agglomerative hierarchical clustering (cAHC), ensuring embeddings from the same chunk belong to different clusters. This paper introduces an alternative clustering approach, a multi-stream extension of the successful Bayesian HMM clustering of x-vectors (VBx), called MS-VBx. Experiments on three datasets demonstrate that MS-VBx outperforms cAHC in diarization and speaker counting performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/04/2021

Towards Neural Diarization for Unlimited Numbers of Speakers Using Global and Local Attractors

Attractor-based end-to-end diarization is achieving comparable accuracy ...
research
04/18/2022

Robust End-to-end Speaker Diarization with Generic Neural Clustering

End-to-end speaker diarization approaches have shown exceptional perform...
research
07/12/2019

Toeplitz Inverse Covariance based Robust Speaker Clustering for Naturalistic Audio Streams

Speaker diarization determines who spoke and when? in an audio stream. I...
research
08/29/2023

Vector Search with OpenAI Embeddings: Lucene Is All You Need

We provide a reproducible, end-to-end demonstration of vector search wit...
research
02/14/2022

Tight integration of neural- and clustering-based diarization through deep unfolding of infinite Gaussian mixture model

Speaker diarization has been investigated extensively as an important ce...
research
02/24/2023

Supervised Hierarchical Clustering using Graph Neural Networks for Speaker Diarization

Conventional methods for speaker diarization involve windowing an audio ...
research
01/13/2022

Improved Multi-objective Data Stream Clustering with Time and Memory Optimization

The analysis of data streams has received considerable attention over th...

Please sign up or login with your details

Forgot password? Click here to reset