Tight integration of neural- and clustering-based diarization through deep unfolding of infinite Gaussian mixture model

02/14/2022
by   Keisuke Kinoshita, et al.
0

Speaker diarization has been investigated extensively as an important central task for meeting analysis. Recent trend shows that integration of end-to-end neural (EEND)-and clustering-based diarization is a promising approach to handle realistic conversational data containing overlapped speech with an arbitrarily large number of speakers, and achieved state-of-the-art results on various tasks. However, the approaches proposed so far have not realized tight integration yet, because the clustering employed therein was not optimal in any sense for clustering the speaker embeddings estimated by the EEND module. To address this problem, this paper introduces a trainable clustering algorithm into the integration framework, by deep-unfolding a non-parametric Bayesian model called the infinite Gaussian mixture model (iGMM). Specifically, the speaker embeddings are optimized during training such that it better fits iGMM clustering, based on a novel clustering loss based on Adjusted Rand Index (ARI). Experimental results based on CALLHOME data show that the proposed approach outperforms the conventional approach in terms of diarization error rate (DER), especially by substantially reducing speaker confusion errors, that indeed reflects the effectiveness of the proposed iGMM integration.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/19/2021

Advances in integration of end-to-end neural and clustering-based diarization for real conversational speech

Recently, we proposed a novel speaker diarization method called End-to-E...
research
10/26/2020

Integrating end-to-end neural and clustering-based diarization: Getting the best of both worlds

Recent diarization technologies can be categorized into two approaches, ...
research
07/28/2022

Utterance-by-utterance overlap-aware neural diarization with Graph-PIT

Recent speaker diarization studies showed that integration of end-to-end...
research
05/23/2023

Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization

Combining end-to-end neural speaker diarization (EEND) with vector clust...
research
10/24/2019

A study of semi-supervised speaker diarization system using gan mixture model

We propose a new speaker diarization system based on a recently introduc...
research
04/06/2021

Speaker Diarization using Two-pass Leave-One-Out Gaussian PLDA Clustering of DNN Embeddings

Many modern systems for speaker diarization, such as the recently-develo...
research
06/10/2019

Enabling Robust State Estimation through Measurement Error Covariance Adaptation

Accurate platform localization is an integral component of most robotic ...

Please sign up or login with your details

Forgot password? Click here to reset