Online End-to-End Neural Diarization with Speaker-Tracing Buffer

06/04/2020
by   Yawen Xue, et al.
0

End-to-end speaker diarization using a fully supervised self-attention mechanism (SA-EEND) has achieved significant improvement from the state-of-art clustering-based methods, especially for the overlapping case. However, applications of original SA-EEND are limited since it has been developed based on offline self-attention algorithms. In this paper, we propose a novel speaker-tracing mechanism to extend SA-EEND to online speaker diarization for practical use. First, this paper demonstrates oracle experiments to show that a straightforward online extension, in which SA-EEND is performed independently for each chunked recording, results in degrading the diarization error rate (DER) due to the speaker permutation inconsistency across the chunk. To circumvent this inconsistency issue, our proposed method, called speaker-tracing buffer, maintains the speaker permutation information determined in previous chunks within the self-attention mechanism for correct speaker-tracing. Our experimental results show that the proposed online SA-EEND with speaker-tracing buffer achieved the DERs of 12.84 for Corpus of Spontaneous Japanese with 1s latency. These results are significantly better than the conventional online clustering method based on x-vector with 1.5s latency, which achieved the DERs of 26.90 respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/21/2021

Online End-to-End Neural Diarization Handling Overlapping Speech and Flexible Numbers of Speakers

This paper proposes an online end-to-end diarization that can handle ove...
research
09/13/2019

End-to-End Neural Speaker Diarization with Self-attention

Speaker diarization has been mainly developed based on the clustering of...
research
06/06/2022

Online Neural Diarization of Unlimited Numbers of Speakers

A method to perform offline and online speaker diarization for an unlimi...
research
02/24/2020

End-to-End Neural Diarization: Reformulating Speaker Diarization as Simple Multi-label Classification

The most common approach to speaker diarization is clustering of speaker...
research
06/08/2021

End-to-End Speaker Diarization Conditioned on Speech Activity and Overlap Detection

In this paper, we present a conditional multitask learning method for en...
research
07/27/2020

Self-Attentive Multi-Layer Aggregation with Feature Recalibration and Normalization for End-to-End Speaker Verification System

One of the most important parts of an end-to-end speaker verification sy...
research
09/14/2021

Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation

We propose to address online speaker diarization as a combination of inc...

Please sign up or login with your details

Forgot password? Click here to reset