Speaker Diarization with LSTM

10/28/2017
by   Quan Wang, et al.
0

For many years, i-vector based speaker embedding techniques were the dominant approach for speaker verification and speaker diarization applications. However, mirroring the rise of deep learning in various domains, neural network based speaker embeddings, also known as d-vectors, have consistently demonstrated superior speaker verification performance. In this paper, we build on the success of d-vector based speaker verification systems to develop a new d-vector based approach to speaker diarization. Specifically, we combine LSTM-based d-vector audio embeddings with recent work in non-parametric clustering to obtain a state-of-the-art speaker diarization system. Our system is evaluated on three standard public datasets, suggesting that d-vector based diarization systems offer significant advantages over traditional i-vector based systems. We achieved a 12.0 CALLHOME, while our model is trained with out-of-domain data from voice search logs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/07/2021

Adapting Speaker Embeddings for Speaker Diarisation

The goal of this paper is to adapt speaker embeddings for solving the pr...
research
11/08/2018

Gaussian-Constrained training for speaker verification

Neural models, in particular the d-vector and x-vector architectures, ha...
research
02/28/2022

Magnitude-aware Probabilistic Speaker Embeddings

Recently, hyperspherical embeddings have established themselves as a dom...
research
10/08/2021

A study of the robustness of raw waveform based speaker embeddings under mismatched conditions

In this paper, we conduct a cross-dataset study on parametric and non-pa...
research
12/29/2020

Bayesian HMM clustering of x-vector sequences (VBx) in speaker diarization: theory, implementation and analysis on standard tasks

The recently proposed VBx diarization method uses a Bayesian hidden Mark...
research
05/21/2018

Speaker Clustering Using Dominant Sets

Speaker clustering is the task of forming speaker-specific groups based ...
research
08/07/2020

Disentangled speaker and nuisance attribute embedding for robust speaker verification

Over the recent years, various deep learning-based embedding methods hav...

Please sign up or login with your details

Forgot password? Click here to reset