Multi-class Spectral Clustering with Overlaps for Speaker Diarization
This paper describes a method for overlap-aware speaker diarization. Given an overlap detector and a speaker embedding extractor, our method performs spectral clustering of segments informed by the output of the overlap detector. This is achieved by transforming the discrete clustering problem into a convex optimization problem which is solved by eigen-decomposition. Thereafter, we discretize the solution by alternatively using singular value decomposition and a modified version of non-maximal suppression which is constrained by the output of the overlap detector. Furthermore, we detail an HMM-DNN based overlap detector which performs frame-level classification and enforces duration constraints through HMM state transitions. Our method achieves a test diarization error rate (DER) of 24.0 meeting corpus, which is a relative improvement of 15.2 agglomerative hierarchical clustering baseline, and compares favorably with other overlap-aware diarization methods. Further analysis on the LibriCSS data demonstrates the effectiveness of the proposed method in high overlap conditions.
READ FULL TEXT