Multi-scale speaker embedding-based graph attention networks for speaker diarisation

10/07/2021
by   Youngki Kwon, et al.
0

The objective of this work is effective speaker diarisation using multi-scale speaker embeddings. Typically, there is a trade-off between the ability to recognise short speaker segments and the discriminative power of the embedding, according to the segment length used for embedding extraction. To this end, recent works have proposed the use of multi-scale embeddings where segments with varying lengths are used. However, the scores are combined using a weighted summation scheme where the weights are fixed after the training phase, whereas the importance of segment lengths can differ with in a single session. To address this issue, we present three key contributions in this paper: (1) we propose graph attention networks for multi-scale speaker diarisation; (2) we design scale indicators to utilise scale information of each embedding; (3) we adapt the attention-based aggregation to utilise a pre-computed affinity matrix from multi-scale embeddings. We demonstrate the effectiveness of our method in various datasets where the speaker confusion which constitutes the primary metric drops over 10

READ FULL TEXT
research
05/16/2021

X-Vectors with Multi-Scale Aggregation for Speaker Diarization

Speaker diarization is the process of labeling different speakers in a s...
research
06/28/2023

MC-SpEx: Towards Effective Speaker Extraction with Multi-Scale Interfusion and Conditional Speaker Modulation

The previous SpEx+ has yielded outstanding performance in speaker extrac...
research
03/30/2022

Multi-scale Speaker Diarization with Dynamic Scale Weighting

Speaker diarization systems are challenged by a trade-off between the te...
research
05/22/2020

Speaker diarization with session-level speaker embedding refinement using graph neural networks

Deep speaker embedding models have been commonly used as a building bloc...
research
03/30/2022

Generation of Speaker Representations Using Heterogeneous Training Batch Assembly

In traditional speaker diarization systems, a well-trained speaker model...
research
10/25/2022

Highly Efficient Real-Time Streaming and Fully On-Device Speaker Diarization with Multi-Stage Clustering

While recent research advances in speaker diarization mostly focus on im...
research
03/17/2022

TMS: A Temporal Multi-scale Backbone Design for Speaker Embedding

Speaker embedding is an important front-end module to explore discrimina...

Please sign up or login with your details

Forgot password? Click here to reset