End-to-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings

by   Soumi Maiti, et al.

We present an end-to-end deep network model that performs meeting diarization from single-channel audio recordings. End-to-end diarization models have the advantage of handling speaker overlap and enabling straightforward handling of discriminative training, unlike traditional clustering-based diarization methods. The proposed system is designed to handle meetings with unknown numbers of speakers, using variable-number permutation-invariant cross-entropy based loss functions. We introduce several components that appear to help with diarization performance, including a local convolutional network followed by a global self-attention module, multi-task transfer learning using a speaker identification component, and a sequential approach where the model is refined with a second stage. These are trained and validated on simulated meeting data based on LibriSpeech and LibriTTS datasets; final evaluations are done using LibriCSS, which consists of simulated meetings recorded using real acoustics via loudspeaker playback. The proposed model performs better than previously proposed end-to-end diarization models on these data.



page 1

page 2

page 3

page 4


End-to-End Neural Diarization: Reformulating Speaker Diarization as Simple Multi-label Classification

The most common approach to speaker diarization is clustering of speaker...

Neural Speaker Diarization with Speaker-Wise Chain Rule

Speaker diarization is an essential step for processing multi-speaker au...

Towards Neural Diarization for Unlimited Numbers of Speakers Using Global and Local Attractors

Attractor-based end-to-end diarization is achieving comparable accuracy ...

End-to-end Neural Diarization: From Transformer to Conformer

We propose a new end-to-end neural diarization (EEND) system that is bas...

Advances in integration of end-to-end neural and clustering-based diarization for real conversational speech

Recently, we proposed a novel speaker diarization method called End-to-E...

Encoder-Decoder Based Attractor Calculation for End-to-End Neural Diarization

This paper investigates an end-to-end neural diarization (EEND) method f...

Integrating end-to-end neural and clustering-based diarization: Getting the best of both worlds

Recent diarization technologies can be categorized into two approaches, ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.