The xmuspeech system for multi-channel multi-party meeting transcription challenge

02/11/2022
by   Jie Wang, et al.
0

This paper describes the system developed by the XMUSPEECH team for the Multi-channel Multi-party Meeting Transcription Challenge (M2MeT). For the speaker diarization task, we propose a multi-channel speaker diarization system that obtains spatial information of speaker by Difference of Arrival (DOA) technology. Speaker-spatial embedding is generated by x-vector and s-vector derived from Filter-and-Sum Beamforming (FSB) which makes the embedding more robust. Specifically, we propose a novel multi-channel sequence-to-sequence neural network architecture named Discriminative Multi-stream Neural Network (DMSNet) which consists of Attention Filter-and-Sum block (AFSB) and Conformer encoder. We explore DMSNet to address overlapped speech problem on multi-channel audio. Compared with LSTM based OSD module, we achieve a decreases of 10.1 OSD module, the DER of cluster-based diarization system decrease significantly form 13.44 diarization error rate (DER) on evaluation set and test set.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/24/2022

Spatial-aware Speaker Diarization for Multi-channel Multi-party Meeting

This paper describes a spatial-aware speaker diarization system for the ...
research
02/09/2022

The Volcspeech system for the ICASSP 2022 multi-channel multi-party meeting transcription challenge

This paper describes our submission to ICASSP 2022 Multi-channel Multi-p...
research
09/17/2023

Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding with Sequence-to-Sequence Architecture

We propose a novel neural speaker diarization system using memory-aware ...
research
02/10/2022

The USTC-Ximalaya system for the ICASSP 2022 multi-channel multi-party meeting transcription (M2MeT) challenge

We propose two improvements to target-speaker voice activity detection (...
research
02/06/2022

Cross-Channel Attention-Based Target Speaker Voice Activity Detection: Experimental Results for M2MeT Challenge

In this paper, we present the speaker diarization system for the Multi-c...
research
04/08/2019

Improved Speaker-Dependent Separation for CHiME-5 Challenge

This paper summarizes several follow-up contributions for improving our ...
research
06/27/2022

Sequence-level Speaker Change Detection with Difference-based Continuous Integrate-and-fire

Speaker change detection is an important task in multi-party interaction...

Please sign up or login with your details

Forgot password? Click here to reset