End-to-End Speaker Diarization as Post-Processing

by   Shota Horiguchi, et al.

This paper investigates the utilization of an end-to-end diarization model as post-processing of conventional clustering-based diarization. Clustering-based diarization methods partition frames into clusters of the number of speakers; thus, they typically cannot handle overlapping speech because each frame is assigned to one speaker. On the other hand, some end-to-end diarization methods can handle overlapping speech by treating the problem as multi-label classification. Although some methods can treat a flexible number of speakers, they do not perform well when the number of speakers is large. To compensate for each other's weakness, we propose to use a two-speaker end-to-end diarization method as post-processing of the results obtained by a clustering-based method. We iteratively select two speakers from the results and update the results of the two speakers to improve the overlapped region. Experimental results show that the proposed algorithm consistently improved the performance of the state-of-the-art methods across CALLHOME, AMI, and DIHARD II datasets.



There are no comments yet.


page 1

page 2

page 3

page 4


Online End-to-End Neural Diarization Handling Overlapping Speech and Flexible Numbers of Speakers

This paper proposes an online end-to-end diarization that can handle ove...

EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers

In this paper, we present a novel framework that jointly performs speake...

Towards Neural Diarization for Unlimited Numbers of Speakers Using Global and Local Attractors

Attractor-based end-to-end diarization is achieving comparable accuracy ...

Speaker Embedding-aware Neural Diarization: an Efficient Framework for Overlapping Speech Diarization in Meeting Scenarios

Overlapping speech diarization has been traditionally treated as a multi...

Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation

We propose to address online speaker diarization as a combination of inc...

Multi-scale Speaker Diarization with Dynamic Scale Weighting

Speaker diarization systems are challenged by a trade-off between the te...

Reformulating Speaker Diarization as Community Detection With Emphasis On Topological Structure

Clustering-based speaker diarization has stood firm as one of the major ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.