Attention-based Encoder-Decoder Network for End-to-End Neural Speaker Diarization with Target Speaker Attractor

05/18/2023
by   Zhengyang Chen, et al.
0

This paper proposes a novel Attention-based Encoder-Decoder network for End-to-End Neural speaker Diarization (AED-EEND). In AED-EEND system, we incorporate the target speaker enrollment information used in target speaker voice activity detection (TS-VAD) to calculate the attractor, which can mitigate the speaker permutation problem and facilitate easier model convergence. In the training process, we propose a teacher-forcing strategy to obtain the enrollment information using the ground-truth label. Furthermore, we propose three heuristic decoding methods to identify the enrollment area for each speaker during the evaluation process. Additionally, we enhance the attractor calculation network LSTM used in the end-to-end encoder-decoder based attractor calculation (EEND-EDA) system by incorporating an attention-based model. By utilizing such an attention-based attractor decoder, our proposed AED-EEND system outperforms both the EEND-EDA and TS-VAD systems with only 0.5s of enrollment data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/13/2023

Attention-based Encoder-Decoder End-to-End Neural Diarization with Embedding Enhancer

Deep neural network-based systems have significantly improved the perfor...
research
12/02/2019

An Attention-Based Speaker Naming Method for Online Adaptation in Non-Fixed Scenarios

A speaker naming task, which finds and identifies the active speaker in ...
research
07/09/2020

Attention-based Residual Speech Portrait Model for Speech to Face Generation

Given a speaker's speech, it is interesting to see if it is possible to ...
research
06/24/2023

Improving End-to-End Neural Diarization Using Conversational Summary Representations

Speaker diarization is a task concerned with partitioning an audio recor...
research
03/13/2023

Neural Diarization with Non-autoregressive Intermediate Attractors

End-to-end neural diarization (EEND) with encoder-decoder-based attracto...
research
04/03/2018

Graph2Seq: Graph to Sequence Learning with Attention-based Neural Networks

Celebrated Sequence to Sequence learning (Seq2Seq) and its fruitful vari...
research
04/04/2022

Target Confusion in End-to-end Speaker Extraction: Analysis and Approaches

Recently, end-to-end speaker extraction has attracted increasing attenti...

Please sign up or login with your details

Forgot password? Click here to reset