Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding with Sequence-to-Sequence Architecture

09/17/2023
by   Gaobin Yang, et al.
0

We propose a novel neural speaker diarization system using memory-aware multi-speaker embedding with sequence-to-sequence architecture (NSD-MS2S), which integrates the strengths of memory-aware multi-speaker embedding (MA-MSE) and sequence-to-sequence (Seq2Seq) architecture, leading to improvement in both efficiency and performance. Next, we further decrease the memory occupation of decoding by incorporating input features fusion and then employ a multi-head attention mechanism to capture features at different levels. NSD-MS2S achieved a macro diarization error rate (DER) of 15.9 signifies a relative improvement of 49 is the key technique for us to achieve the best performance for the main track of CHiME-7 DASR Challenge. Additionally, we introduce a deep interactive module (DIM) in MA-MSE module to better retrieve a cleaner and more discriminative multi-speaker embedding, enabling the current model to outperform the system we used in the CHiME-7 DASR Challenge. Our code will be available at https://github.com/liyunlongaaa/NSD-MS2S.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/24/2022

Spatial-aware Speaker Diarization for Multi-channel Multi-party Meeting

This paper describes a spatial-aware speaker diarization system for the ...
research
02/11/2022

The xmuspeech system for multi-channel multi-party meeting transcription challenge

This paper describes the system developed by the XMUSPEECH team for the ...
research
09/30/2021

Fine-tuning wav2vec2 for speaker recognition

This paper explores applying the wav2vec2 framework to speaker recogniti...
research
06/23/2022

The SJTU X-LANCE Lab System for CNSRC 2022

This technical report describes the SJTU X-LANCE Lab system for the thre...
research
10/28/2022

Speaker recognition with two-step multi-modal deep cleansing

Neural network-based speaker recognition has achieved significant improv...
research
03/19/2016

A Persona-Based Neural Conversation Model

We present persona-based models for handling the issue of speaker consis...
research
08/12/2021

Xi-Vector Embedding for Speaker Recognition

We present a Bayesian formulation for deep speaker embedding, wherein th...

Please sign up or login with your details

Forgot password? Click here to reset