BW-EDA-EEND: Streaming End-to-End Neural Speaker Diarization for a Variable Number of Speakers

11/05/2020
by   Eunjung Han, et al.
0

We present a novel online end-to-end neural diarization system, BW-EDA-EEND, that processes data incrementally for a variable number of speakers. The system is based on the EDA architecture of Horiguchi et al., but utilizes the incremental Transformer encoder, attending only to its left contexts and using block-level recurrence in the hidden states to carry information from block to block, making the algorithm complexity linear in time. We propose two variants of it. For unlimited-latency BW-EDA-EEND, which processes inputs in linear time, we show only moderate degradation for up to two speakers using a context size of 10 seconds compared to offline EDA-EEND. With more than two speakers, the accuracy gap between online and offline grows, but it still outperforms a baseline offline clustering diarization system for one to four speakers with unlimited context size, and shows comparable accuracy with context size of 10 seconds. For limited-latency BW-EDA-EEND, which produces diarization outputs block-by-block as audio arrives, we show accuracy comparable to the offline clustering-based system.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/06/2022

Online Neural Diarization of Unlimited Numbers of Speakers

A method to perform offline and online speaker diarization for an unlimi...
research
01/21/2021

Online End-to-End Neural Diarization Handling Overlapping Speech and Flexible Numbers of Speakers

This paper proposes an online end-to-end diarization that can handle ove...
research
07/04/2021

Towards Neural Diarization for Unlimited Numbers of Speakers Using Global and Local Attractors

Attractor-based end-to-end diarization is achieving comparable accuracy ...
research
11/27/2021

Online Speaker Diarization with Graph-based Label Generation

This paper introduces an online speaker diarization system that can hand...
research
09/14/2021

Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation

We propose to address online speaker diarization as a combination of inc...
research
11/16/2020

Block-Online Guided Source Separation

We propose a block-online algorithm of guided source separation (GSS). G...
research
12/07/2022

Too Slow to Be Useful? On Incorporating Humans in the Loop of Smart Speakers

Real-time crowd-powered systems, such as Chorus/Evorus, VizWiz, and Appa...

Please sign up or login with your details

Forgot password? Click here to reset