Cross-Channel Attention-Based Target Speaker Voice Activity Detection: Experimental Results for M2MeT Challenge

02/06/2022
by   Weiqing Wang, et al.
0

In this paper, we present the speaker diarization system for the Multi-channel Multi-party Meeting Transcription Challenge (M2MeT) from team DKU_DukeECE. As the highly overlapped speech exists in the dataset, we employ an x-vector-based target-speaker voice activity detection (TS-VAD) to find the overlap between speakers. For the single-channel scenario, we separately train a model for each of the 8 channels and fuse the results. We also employ the cross-channel self-attention to further improve the performance, where the non-linear spatial correlations between different channels are learned and fused. Experimental results on the evaluation set show that the single-channel TS-VAD reduces the DER by over 75 TS-VAD further reduces the DER by 28 submitted system achieves a DER of 2.98 ranks 1st in the M2MET challenge.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/05/2021

The DKU-DukeECE-Lenovo System for the Diarization Task of the 2021 VoxCeleb Speaker Recognition Challenge

This report describes the submission of the DKU-DukeECE-Lenovo team to t...
research
10/11/2022

MFCCA:Multi-Frame Cross-Channel attention for multi-speaker ASR in Multi-party meeting scenario

Recently cross-channel attention, which better leverages multi-channel s...
research
02/06/2021

The DKU-Duke-Lenovo System Description for the Third DIHARD Speech Diarization Challenge

In this paper, we present the submitted system for the third DIHARD Spee...
research
02/23/2020

DIHARD II is Still Hard: Experimental Results and Discussions from the DKU-LENOVO Team

In this paper, we present the submitted system for the second DIHARD Spe...
research
05/14/2020

Target-Speaker Voice Activity Detection: a Novel Approach for Multi-Speaker Diarization in a Dinner Party Scenario

Speaker diarization for real-life scenarios is an extremely challenging ...
research
02/11/2022

The xmuspeech system for multi-channel multi-party meeting transcription challenge

This paper describes the system developed by the XMUSPEECH team for the ...
research
02/24/2022

Closing the Gap between Single-User and Multi-User VoiceFilter-Lite

VoiceFilter-Lite is a speaker-conditioned voice separation model that pl...

Please sign up or login with your details

Forgot password? Click here to reset