Speaker Diarization using Two-pass Leave-One-Out Gaussian PLDA Clustering of DNN Embeddings

04/06/2021
by   Kiran Karra, et al.
0

Many modern systems for speaker diarization, such as the recently-developed VBx approach, rely on clustering of DNN speaker embeddings followed by resegmentation. Two problems with this approach are that the DNN is not directly optimized for this task, and the parameters need significant retuning for different applications. We have recently presented progress in this direction with a Leave-One-Out Gaussian PLDA (LGP) clustering algorithm and an approach to training the DNN such that embeddings directly optimize performance of this scoring method. This paper presents a new two-pass version of this system, where the second pass uses finer time resolution to significantly improve overall performance. For the Callhome corpus, we achieve the first published error rate below 4% without any task-dependent parameter tuning. We also show significant progress towards a robust single solution for multiple diarization tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/30/2019

Embeddings for DNN speaker adaptive training

In this work, we investigate the use of embeddings for speaker-adaptive ...
research
06/28/2015

Improved Deep Speaker Feature Learning for Text-Dependent Speaker Recognition

A deep learning approach has been proposed recently to derive speaker id...
research
10/22/2020

Combination of Deep Speaker Embeddings for Diarisation

Recently, significant progress has been made in speaker diarisation afte...
research
02/14/2022

Tight integration of neural- and clustering-based diarization through deep unfolding of infinite Gaussian mixture model

Speaker diarization has been investigated extensively as an important ce...
research
10/24/2022

Spectral Clustering-aware Learning of Embeddings for Speaker Diarisation

In speaker diarisation, speaker embedding extraction models often suffer...
research
05/21/2018

Speaker Clustering Using Dominant Sets

Speaker clustering is the task of forming speaker-specific groups based ...
research
02/03/2021

Data Generation Using Pass-phrase-dependent Deep Auto-encoders for Text-Dependent Speaker Verification

In this paper, we propose a novel method that trains pass-phrase specifi...

Please sign up or login with your details

Forgot password? Click here to reset