Absolute decision corrupts absolutely: conservative online speaker diarisation

11/09/2022
by   Youngki Kwon, et al.
0

Our focus lies in developing an online speaker diarisation framework which demonstrates robust performance across diverse domains. In online speaker diarisation, outputs generated in real-time are irreversible, and a few misjudgements in the early phase of an input session can lead to catastrophic results. We hypothesise that cautiously increasing the number of estimated speakers is of paramount importance among many other factors. Thus, our proposed framework includes decreasing the number of speakers by one when the system judges that an increase in the past was faulty. We also adopt dual buffers, checkpoints and centroids, where checkpoints are combined with silhouette coefficients to estimate the number of speakers and centroids represent speakers. Again, we believe that more than one centroid can be generated from one speaker. Thus we design a clustering-based label matching technique to assign labels in real-time. The resulting system is lightweight yet surprisingly effective. The system demonstrates state-of-the-art performance on DIHARD 2 and 3 datasets, where it is also competitive in AMI and VoxConverse test sets.

READ FULL TEXT
research
07/20/2021

A Real-time Speaker Diarization System Based on Spatial Spectrum

In this paper we describe a speaker diarization system that enables loca...
research
11/27/2021

Online Speaker Diarization with Graph-based Label Generation

This paper introduces an online speaker diarization system that can hand...
research
06/06/2022

Online Neural Diarization of Unlimited Numbers of Speakers

A method to perform offline and online speaker diarization for an unlimi...
research
03/30/2022

Multi-scale Speaker Diarization with Dynamic Scale Weighting

Speaker diarization systems are challenged by a trade-off between the te...
research
10/10/2018

Fully Supervised Speaker Diarization

In this paper, we propose a fully supervised speaker diarization approac...
research
12/05/2017

Multi-speaker Recognition in Cocktail Party Problem

This paper proposes an original statistical decision theory to accomplis...
research
08/05/2022

Chronological Self-Training for Real-Time Speaker Diarization

Diarization partitions an audio stream into segments based on the voices...

Please sign up or login with your details

Forgot password? Click here to reset