Supervised online diarization with sample mean loss for multi-domain data

11/04/2019
by   Enrico Fini, et al.
0

Recently, a fully supervised speaker diarization approach was proposed (UIS-RNN) which models speakers using multiple instances of a parameter-sharing recurrent neural network. In this paper we propose qualitative modifications to the model that significantly improve the learning efficiency and the overall diarization performance. In particular, we introduce a novel loss function, we called Sample Mean Loss and we present a better modelling of the speaker turn behaviour, by devising an analytical expression to compute the probability of a new speaker joining the conversation. In addition, we demonstrate that our model can be trained on fixed-length speech segments, removing the need for speaker change information in inference. Using x-vectors as input features, we evaluate our proposed approach on the multi-domain dataset employed in the DIHARD II challenge: our online method improves with respect to the original UIS-RNN and achieves similar performance to an offline agglomerative clustering baseline using PLDA scoring.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/10/2018

Fully Supervised Speaker Diarization

In this paper, we propose a fully supervised speaker diarization approac...
research
01/21/2021

Online End-to-End Neural Diarization Handling Overlapping Speech and Flexible Numbers of Speakers

This paper proposes an online end-to-end diarization that can handle ove...
research
05/30/2023

Language-independent speaker anonymization using orthogonal Householder neural network

Speaker anonymization aims to conceal a speaker's identity while preserv...
research
04/18/2022

Robust End-to-end Speaker Diarization with Generic Neural Clustering

End-to-end speaker diarization approaches have shown exceptional perform...
research
09/27/2018

Sample Efficient Adaptive Text-to-Speech

We present a meta-learning approach for adaptive text-to-speech (TTS) wi...
research
11/27/2021

Online Speaker Diarization with Graph-based Label Generation

This paper introduces an online speaker diarization system that can hand...
research
05/03/2023

What makes a good pause? Investigating the turn-holding effects of fillers

Filled pauses (or fillers), such as "uh" and "um", are frequent in spont...

Please sign up or login with your details

Forgot password? Click here to reset