Neural Diarization with Non-autoregressive Intermediate Attractors

03/13/2023
by   Yusuke Fujita, et al.
0

End-to-end neural diarization (EEND) with encoder-decoder-based attractors (EDA) is a promising method to handle the whole speaker diarization problem simultaneously with a single neural network. While the EEND model can produce all frame-level speaker labels simultaneously, it disregards output label dependency. In this work, we propose a novel EEND model that introduces the label dependency between frames. The proposed method generates non-autoregressive intermediate attractors to produce speaker labels at the lower layers and conditions the subsequent layers with these labels. While the proposed model works in a non-autoregressive manner, the speaker labels are refined by referring to the whole sequence of intermediate labels. The experiments with the two-speaker CALLHOME dataset show that the intermediate labels with the proposed non-autoregressive intermediate attractors boost the diarization performance. The proposed method with the deeper network benefits more from the intermediate labels, resulting in better performance and training throughput than EEND-EDA.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/12/2019

End-to-End Neural Speaker Diarization with Permutation-Free Objectives

In this paper, we propose a novel end-to-end neural-network-based speake...
research
07/04/2021

Unified Autoregressive Modeling for Joint End-to-End Multi-Talker Overlapped Speech Recognition and Speaker Attribute Estimation

In this paper, we present a novel modeling method for single-channel mul...
research
05/18/2023

Attention-based Encoder-Decoder Network for End-to-End Neural Speaker Diarization with Target Speaker Attractor

This paper proposes a novel Attention-based Encoder-Decoder network for ...
research
06/27/2022

Sequence-level Speaker Change Detection with Difference-based Continuous Integrate-and-fire

Speaker change detection is an important task in multi-party interaction...
research
11/06/2020

Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis

We describe a sequence-to-sequence neural network which can directly gen...
research
01/02/2019

Plugin Networks for Inference under Partial Evidence

In this paper, we propose a novel method to incorporate partial evidence...
research
11/01/2021

Exploring Non-Autoregressive End-To-End Neural Modeling For English Mispronunciation Detection And Diagnosis

End-to-end (E2E) neural modeling has emerged as one predominant school o...

Please sign up or login with your details

Forgot password? Click here to reset