Multi-scale Speaker Diarization with Dynamic Scale Weighting

03/30/2022
by   Tae Jin Park, et al.
0

Speaker diarization systems are challenged by a trade-off between the temporal resolution and the fidelity of the speaker representation. By obtaining a superior temporal resolution with an enhanced accuracy, a multi-scale approach is a way to cope with such a trade-off. In this paper, we propose a more advanced multi-scale diarization system based on a multi-scale diarization decoder. There are two main contributions in this study that significantly improve the diarization performance. First, we use multi-scale clustering as an initialization to estimate the number of speakers and obtain the average speaker representation vector for each speaker and each scale. Next, we propose the use of 1-D convolutional neural networks that dynamically determine the importance of each scale at each time step. To handle a variable number of speakers and overlapping speech, the proposed system can estimate the number of existing speakers. Our proposed system achieves a state-of-art performance on the CALLHOME and AMI MixHeadset datasets, with 3.92 diarization error rates, respectively.

READ FULL TEXT
research
06/28/2023

MC-SpEx: Towards Effective Speaker Extraction with Multi-Scale Interfusion and Conditional Speaker Modulation

The previous SpEx+ has yielded outstanding performance in speaker extrac...
research
10/07/2021

Multi-scale speaker embedding-based graph attention networks for speaker diarisation

The objective of this work is effective speaker diarisation using multi-...
research
05/16/2021

X-Vectors with Multi-Scale Aggregation for Speaker Diarization

Speaker diarization is the process of labeling different speakers in a s...
research
02/03/2022

MFA: TDNN with Multi-scale Frequency-channel Attention for Text-independent Speaker Verification with Short Utterances

The time delay neural network (TDNN) represents one of the state-of-the-...
research
11/09/2022

Absolute decision corrupts absolutely: conservative online speaker diarisation

Our focus lies in developing an online speaker diarisation framework whi...
research
08/26/2020

FCN Approach for Dynamically Locating Multiple Speakers

In this paper, we present a deep neural network-based online multi-speak...
research
03/30/2022

Multi-target Filter and Detector for Unknown-number Speaker Diarization

A strong representation of a target speaker can aid in extracting import...

Please sign up or login with your details

Forgot password? Click here to reset