Speaker diarization with session-level speaker embedding refinement using graph neural networks

05/22/2020
by   Jixuan Wang, et al.
6

Deep speaker embedding models have been commonly used as a building block for speaker diarization systems; however, the speaker embedding model is usually trained according to a global loss defined on the training data, which could be sub-optimal for distinguishing speakers locally in a specific meeting session. In this work we present the first use of graph neural networks (GNNs) for the speaker diarization problem, utilizing a GNN to refine speaker embeddings locally using the structural information between speech segments inside each session. The speaker embeddings extracted by a pre-trained model are remapped into a new embedding space, in which the different speakers within a single session are better separated. The model is trained for linkage prediction in a supervised manner by minimizing the difference between the affinity matrix constructed by the refined embeddings and the ground-truth adjacency matrix. Spectral clustering is then applied on top of the refined embeddings. We show that the clustering performance of the refined speaker embeddings outperforms the original embeddings significantly on both simulated and real meeting data, and our system achieves the state-of-the-art result on the NIST SRE 2000 CALLHOME database.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/24/2023

Supervised Hierarchical Clustering using Graph Neural Networks for Speaker Diarization

Conventional methods for speaker diarization involve windowing an audio ...
research
10/07/2021

Multi-scale speaker embedding-based graph attention networks for speaker diarisation

The objective of this work is effective speaker diarisation using multi-...
research
10/22/2020

Compositional embedding models for speaker identification and diarization with simultaneous speech from 2+ speakers

We propose a new method for speaker diarization that can handle overlapp...
research
10/24/2022

Spectral Clustering-aware Learning of Embeddings for Speaker Diarisation

In speaker diarisation, speaker embedding extraction models often suffer...
research
01/14/2020

Supervised Speaker Embedding De-Mixing in Two-Speaker Environment

In this work, a speaker embedding de-mixing approach is proposed. Instea...
research
03/30/2022

Generation of Speaker Representations Using Heterogeneous Training Batch Assembly

In traditional speaker diarization systems, a well-trained speaker model...
research
12/01/2021

STEM: Unsupervised STructural EMbedding for Stance Detection

Stance detection is an important task, supporting many downstream tasks ...

Please sign up or login with your details

Forgot password? Click here to reset