Bayesian HMM clustering of x-vector sequences (VBx) in speaker diarization: theory, implementation and analysis on standard tasks

12/29/2020
by   Federico Landini, et al.
0

The recently proposed VBx diarization method uses a Bayesian hidden Markov model to find speaker clusters in a sequence of x-vectors. In this work we perform an extensive comparison of performance of the VBx diarization with other approaches in the literature and we show that VBx achieves superior performance on three of the most popular datasets for evaluating diarization: CALLHOME, AMI and DIHARDII datasets. Further, we present for the first time the derivation and update formulae for the VBx model, focusing on the efficiency and simplicity of this model as compared to the previous and more complex BHMM model working on frame-by-frame standard Cepstral features. Together with this publication, we release the recipe for training the x-vector extractors used in our experiments on both wide and narrowband data, and the VBx recipes that attain state-of-the-art performance on all three datasets. Besides, we point out the lack of a standardized evaluation protocol for AMI dataset and we propose a new protocol for both Beamformed and Mix-Headset audios based on the official AMI partitions and transcriptions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/29/2017

Using Second-Order Hidden Markov Model to Improve Speaker Identification Recognition Performance under Neutral Condition

In this paper, second-order hidden Markov model (HMM2) has been used and...
research
10/28/2017

Speaker Diarization with LSTM

For many years, i-vector based speaker embedding techniques were the dom...
research
09/29/2019

Speaker Verification in Emotional Talking Environments based on Third-Order Circular Suprasegmental Hidden Markov Model

Speaker verification accuracy in emotional talking environments is not h...
research
06/20/2019

Unleashing the Unused Potential of I-Vectors Enabled by GPU Acceleration

Speaker embeddings are continuous-value vector representations that allo...
research
07/13/2019

Speaker Recognition with Random Digit Strings Using Uncertainty Normalized HMM-based i-vectors

In this paper, we combine Hidden Markov Models (HMMs) with i-vector extr...
research
01/22/2018

Identifying Speakers Using Their Emotion Cues

This paper addresses the formulation of a new speaker identification app...

Please sign up or login with your details

Forgot password? Click here to reset