Online Neural Diarization of Unlimited Numbers of Speakers

06/06/2022
by   Shota Horiguchi, et al.
0

A method to perform offline and online speaker diarization for an unlimited number of speakers is described in this paper. End-to-end neural diarization (EEND) has achieved overlap-aware speaker diarization by formulating it as a multi-label classification problem. It has also been extended for a flexible number of speakers by introducing speaker-wise attractors. However, the output number of speakers of attractor-based EEND is empirically capped; it cannot deal with cases where the number of speakers appearing during inference is higher than that during training because its speaker counting is trained in a fully supervised manner. Our method, EEND-GLA, solves this problem by introducing unsupervised clustering into attractor-based EEND. In the method, the input audio is first divided into short blocks, then attractor-based diarization is performed for each block, and finally the results of each blocks are clustered on the basis of the similarity between locally-calculated attractors. While the number of output speakers is limited within each block, the total number of speakers estimated for the entire input can be higher than the limitation. To use EEND-GLA in an online manner, our method also extends the speaker-tracing buffer, which was originally proposed to enable online inference of conventional EEND. We introduces a block-wise buffer update to make the speaker-tracing buffer compatible with EEND-GLA. Finally, to improve online diarization, our method improves the buffer update method and revisits the variable chunk-size training of EEND. The experimental results demonstrate that EEND-GLA can perform speaker diarization of an unseen number of speakers in both offline and online inferences.

READ FULL TEXT
research
01/21/2021

Online End-to-End Neural Diarization Handling Overlapping Speech and Flexible Numbers of Speakers

This paper proposes an online end-to-end diarization that can handle ove...
research
11/05/2020

BW-EDA-EEND: Streaming End-to-End Neural Speaker Diarization for a Variable Number of Speakers

We present a novel online end-to-end neural diarization system, BW-EDA-E...
research
11/27/2021

Online Speaker Diarization with Graph-based Label Generation

This paper introduces an online speaker diarization system that can hand...
research
09/14/2021

Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation

We propose to address online speaker diarization as a combination of inc...
research
07/04/2021

Towards Neural Diarization for Unlimited Numbers of Speakers Using Global and Local Attractors

Attractor-based end-to-end diarization is achieving comparable accuracy ...
research
06/04/2020

Online End-to-End Neural Diarization with Speaker-Tracing Buffer

End-to-end speaker diarization using a fully supervised self-attention m...
research
11/09/2022

Absolute decision corrupts absolutely: conservative online speaker diarisation

Our focus lies in developing an online speaker diarisation framework whi...

Please sign up or login with your details

Forgot password? Click here to reset