Improving Diarization Robustness using Diversification, Randomization and the DOVER Algorithm

10/24/2019
by   Andreas Stolcke, et al.
0

Speaker diarization based on bottom-up clustering of speech segments by acoustic similarity is often highly sensitive to the choice of hyperparameters, such as the initial number of clusters and feature weighting. Optimizing these hyperparameters is difficult and often not robust across different data sets. We recently proposed the DOVER algorithm for combining multiple diarization hypotheses by voting. Here we propose to mitigate the robustness problem in diarization by using DOVER to average across different parameter choices. We also investigate the combination of diverse outputs obtained by following different merge choices pseudo-randomly in the course of clustering, thereby mitigating the greediness of best-first clustering. We show on two conference meeting data sets drawn from NIST evaluations that the proposed methods indeed yield more robust, and in several cases overall improved, results.

READ FULL TEXT
research
11/03/2020

DOVER-Lap: A Method for Combining Overlap-aware Diarization Outputs

Several advances have been made recently towards handling overlapping sp...
research
02/06/2021

An empirical comparison and characterisation of nine popular clustering methods

Nine popular clustering methods are applied to 42 real data sets. The ai...
research
01/22/2016

When is Clustering Perturbation Robust?

Clustering is a fundamental data mining tool that aims to divide data in...
research
03/17/2021

SPICE: Semantic Pseudo-labeling for Image Clustering

This paper presents SPICE, a Semantic Pseudo-labeling framework for Imag...
research
04/28/2013

Deterministic Initialization of the K-Means Algorithm Using Hierarchical Clustering

K-means is undoubtedly the most widely used partitional clustering algor...
research
09/17/2019

DOVER: A Method for Combining Diarization Outputs

Speech recognition and other natural language tasks have long benefited ...
research
09/15/2023

Choice of trimming proportion and number of clusters in robust clustering based on trimming

So-called "classification trimmed likelihood curves" have been proposed ...

Please sign up or login with your details

Forgot password? Click here to reset