Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation

09/14/2021
by   Juan M. Coria, et al.
0

We propose to address online speaker diarization as a combination of incremental clustering and local diarization applied to a rolling buffer updated every 500ms. Every single step of the proposed pipeline is designed to take full advantage of the strong ability of a recently proposed end-to-end overlap-aware segmentation to detect and separate overlapping speakers. In particular, we propose a modified version of the statistics pooling layer (initially introduced in the x-vector architecture) to give less weight to frames where the segmentation model predicts simultaneous speakers. Furthermore, we derive cannot-link constraints from the initial segmentation step to prevent two local speakers from being wrongfully merged during the incremental clustering step. Finally, we show how the latency of the proposed approach can be adjusted between 500ms and 5s to match the requirements of a particular use case, and we provide a systematic analysis of the influence of latency on the overall performance (on AMI, DIHARD and VoxConverse).

READ FULL TEXT
research
01/21/2021

Online End-to-End Neural Diarization Handling Overlapping Speech and Flexible Numbers of Speakers

This paper proposes an online end-to-end diarization that can handle ove...
research
06/06/2022

Online Neural Diarization of Unlimited Numbers of Speakers

A method to perform offline and online speaker diarization for an unlimi...
research
12/18/2020

End-to-End Speaker Diarization as Post-Processing

This paper investigates the utilization of an end-to-end diarization mod...
research
07/28/2022

Utterance-by-utterance overlap-aware neural diarization with Graph-PIT

Recent speaker diarization studies showed that integration of end-to-end...
research
10/25/2019

Overlap-aware diarization: resegmentation using neural end-to-end overlapped speech detection

We address the problem of effectively handling overlapping speech in a d...
research
11/05/2020

BW-EDA-EEND: Streaming End-to-End Neural Speaker Diarization for a Variable Number of Speakers

We present a novel online end-to-end neural diarization system, BW-EDA-E...
research
06/04/2020

Online End-to-End Neural Diarization with Speaker-Tracing Buffer

End-to-end speaker diarization using a fully supervised self-attention m...

Please sign up or login with your details

Forgot password? Click here to reset