Combination of Deep Speaker Embeddings for Diarisation

10/22/2020
by   Guangzhi Sun, et al.
0

Recently, significant progress has been made in speaker diarisation after the introduction of d-vectors as speaker embeddings extracted from the neural network (NN) speaker classifiers for clustering speech segments. To extract better-performing and more robust speaker embeddings, this paper proposes a c-vector method by combining multiple sets of complementary d-vectors derived from systems with different NN components. Three structures are used to implement the c-vectors, namely 2D self-attentive, gated additive, and bilinear pooling structures, relying on attention mechanisms, a gating mechanism, and a low-rank bilinear pooling mechanism respectively. Furthermore, a neural-based single-pass speaker diarisation pipeline is also proposed in this paper, which uses NNs to achieve voice activity detection, speaker change point detection, and speaker embedding extraction. Experiments and detailed analyses are conducted on the challenging AMI and NIST RT05 datasets which consist of real meetings with 4–10 speakers and a wide range of acoustic conditions. Consistent improvements are obtained by using c-vectors instead of d-vectors, and similar relative improvements in diarisation error rates are observed on both AMI and RT05, which shows the robustness of the proposed methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/08/2019

Speaker diarisation using 2D self-attentive combination of embeddings

Speaker diarisation systems often cluster audio segments using speaker e...
research
04/13/2018

Speaker Embedding Extraction with Phonetic Information

Speaker embeddings achieve promising results on many speaker verificatio...
research
04/07/2021

Adapting Speaker Embeddings for Speaker Diarisation

The goal of this paper is to adapt speaker embeddings for solving the pr...
research
04/06/2021

Speaker Diarization using Two-pass Leave-One-Out Gaussian PLDA Clustering of DNN Embeddings

Many modern systems for speaker diarization, such as the recently-develo...
research
04/07/2022

Detecting Vocal Fatigue with Neural Embeddings

Vocal fatigue refers to the feeling of tiredness and weakness of voice d...
research
11/16/2022

Speaker Adaptation for End-To-End Speech Recognition Systems in Noisy Environments

We analyze the impact of speaker adaptation in end-to-end architectures ...
research
07/19/2020

Meta-learning with Latent Space Clustering in Generative Adversarial Network for Speaker Diarization

The performance of most speaker diarization systems with x-vector embedd...

Please sign up or login with your details

Forgot password? Click here to reset