DIHARD II is Still Hard: Experimental Results and Discussions from the DKU-LENOVO Team

02/23/2020
by   Qingjian Lin, et al.
10

In this paper, we present the submitted system for the second DIHARD Speech Diarization Challenge from the DKULENOVO team. Our diarization system includes multiple modules, namely voice activity detection (VAD), segmentation, speaker embedding extraction, similarity scoring, clustering, resegmentation and overlap detection. For each module, we explore different techniques to enhance performance. Our final submission employs the ResNet-LSTM based VAD, the Deep ResNet based speaker embedding, the LSTM based similarity scoring and spectral clustering. Variational Bayes (VB) diarization is applied in the resegmentation stage and overlap detection also brings slight improvement. Our proposed system achieves 18.84 have reduced the DERs by 27.5 baselines, we believe that the diarization task is still very difficult.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset