Neural Speaker Diarization with Speaker-Wise Chain Rule

06/02/2020
by   Yusuke Fujita, et al.
0

Speaker diarization is an essential step for processing multi-speaker audio. Although an end-to-end neural diarization (EEND) method achieved state-of-the-art performance, it is limited to a fixed number of speakers. In this paper, we solve this fixed number of speaker issue by a novel speaker-wise conditional inference method based on the probabilistic chain rule. In the proposed method, each speaker's speech activity is regarded as a single random variable, and is estimated sequentially conditioned on previously estimated other speakers' speech activities. Similar to other sequence-to-sequence models, the proposed method produces a variable number of speakers with a stop sequence condition. We evaluated the proposed method on multi-speaker audio recordings of a variable number of speakers. Experimental results show that the proposed method can correctly produce diarization results with a variable number of speakers and outperforms the state-of-the-art end-to-end speaker diarization methods in terms of diarization error rate.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/20/2020

End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors

End-to-end speaker diarization for an unknown number of speakers is addr...
research
06/08/2021

End-to-End Speaker Diarization Conditioned on Speech Activity and Overlap Detection

In this paper, we present a conditional multitask learning method for en...
research
06/25/2020

Speaker-Conditional Chain Model for Speech Separation and Extraction

Speech separation has been extensively explored to tackle the cocktail p...
research
11/02/2022

Towards End-to-end Speaker Diarization in the Wild

Speaker diarization algorithms address the "who spoke when" problem in a...
research
04/03/2021

Diarization of Legal Proceedings. Identifying and Transcribing Judicial Speech from Recorded Court Audio

United States Courts make audio recordings of oral arguments available a...
research
08/17/2023

Home monitoring for frailty detection through sound and speaker diarization analysis

As the French, European and worldwide populations are aging, there is a ...
research
02/29/2020

Voice Separation with an Unknown Number of Multiple Speakers

We present a new method for separating a mixed audio sequence, in which ...

Please sign up or login with your details

Forgot password? Click here to reset