End-to-End Neural Speaker Diarization with Permutation-Free Objectives

09/12/2019
by   Yusuke Fujita, et al.
0

In this paper, we propose a novel end-to-end neural-network-based speaker diarization method. Unlike most existing methods, our proposed method does not have separate modules for extraction and clustering of speaker representations. Instead, our model has a single neural network that directly outputs speaker diarization results. To realize such a model, we formulate the speaker diarization problem as a multi-label classification problem, and introduces a permutation-free objective function to directly minimize diarization errors without being suffered from the speaker-label permutation problem. Besides its end-to-end simplicity, the proposed method also benefits from being able to explicitly handle overlapping speech during training and inference. Because of the benefit, our model can be easily trained/adapted with real-recorded multi-speaker conversations just by feeding the corresponding multi-speaker segment labels. We evaluated the proposed method on simulated speech mixtures. The proposed method achieved diarization error rate of 12.28 conventional clustering-based system produced diarization error rate of 28.77 Furthermore, the domain adaptation with real-recorded speech provided 25.6 relative improvement on the CALLHOME dataset. Our source code is available online at https://github.com/hitachi-speech/EEND.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/24/2020

End-to-End Neural Diarization: Reformulating Speaker Diarization as Simple Multi-label Classification

The most common approach to speaker diarization is clustering of speaker...
research
03/13/2023

Neural Diarization with Non-autoregressive Intermediate Attractors

End-to-end neural diarization (EEND) with encoder-decoder-based attracto...
research
09/13/2019

End-to-End Neural Speaker Diarization with Self-attention

Speaker diarization has been mainly developed based on the clustering of...
research
04/08/2021

End-to-end speaker segmentation for overlap-aware resegmentation

Speaker segmentation consists in partitioning a conversation between one...
research
04/02/2022

From Simulated Mixtures to Simulated Conversations as Training Data for End-to-End Neural Diarization

End-to-end neural diarization (EEND) is nowadays one of the most promine...
research
06/08/2021

End-to-End Speaker Diarization Conditioned on Speech Activity and Overlap Detection

In this paper, we present a conditional multitask learning method for en...
research
05/28/2021

DIVE: End-to-end Speech Diarization via Iterative Speaker Embedding

We introduce DIVE, an end-to-end speaker diarization algorithm. Our neur...

Please sign up or login with your details

Forgot password? Click here to reset