DIVE: End-to-end Speech Diarization via Iterative Speaker Embedding

05/28/2021
by   Neil Zeghidour, et al.
8

We introduce DIVE, an end-to-end speaker diarization algorithm. Our neural algorithm presents the diarization task as an iterative process: it repeatedly builds a representation for each speaker before predicting the voice activity of each speaker conditioned on the extracted representations. This strategy intrinsically resolves the speaker ordering ambiguity without requiring the classical permutation invariant training loss. In contrast with prior work, our model does not rely on pretrained speaker representations and optimizes all parameters of the system with a multi-speaker voice activity loss. Importantly, our loss explicitly excludes unreliable speaker turn boundaries from training, which is adapted to the standard collar-based Diarization Error Rate (DER) evaluation. Overall, these contributions yield a system redefining the state-of-the-art on the standard CALLHOME benchmark, with 6.7 7.8

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/08/2021

End-to-end speaker segmentation for overlap-aware resegmentation

Speaker segmentation consists in partitioning a conversation between one...
research
07/15/2023

Single and Multi-Speaker Cloned Voice Detection: From Perceptual to Learned Features

Synthetic-voice cloning technologies have seen significant advances in r...
research
06/08/2021

End-to-End Speaker Diarization Conditioned on Speech Activity and Overlap Detection

In this paper, we present a conditional multitask learning method for en...
research
09/12/2019

End-to-End Neural Speaker Diarization with Permutation-Free Objectives

In this paper, we propose a novel end-to-end neural-network-based speake...
research
03/07/2023

TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings

Since diarization and source separation of meeting data are closely rela...
research
11/11/2022

Augmenting Transformer-Transducer Based Speaker Change Detection With Token-Level Training Loss

In this work we propose a novel token-based training strategy that impro...
research
03/02/2023

Improving Transformer-based End-to-End Speaker Diarization by Assigning Auxiliary Losses to Attention Heads

Transformer-based end-to-end neural speaker diarization (EEND) models ut...

Please sign up or login with your details

Forgot password? Click here to reset