The Volcspeech system for the ICASSP 2022 multi-channel multi-party meeting transcription challenge

02/09/2022
by   Chen Shen, et al.
9

This paper describes our submission to ICASSP 2022 Multi-channel Multi-party Meeting Transcription (M2MeT) Challenge. For Track 1, we propose several approaches to empower the clustering-based speaker diarization system to handle overlapped speech. Front-end dereverberation and the direction-of-arrival (DOA) estimation are used to improve the accuracy of speaker diarization. Multi-channel combination and overlap detection are applied to reduce the missed speaker error. A modified DOVER-Lap is also proposed to fuse the results of different systems. We achieve the final DER of 5.79 7.23 model in a joint CTC-attention architecture. Serialized output training is adopted to multi-speaker overlapped speech recognition. We propose a neural front-end module to model multi-channel audio and train the model end-to-end. Various data augmentation methods are utilized to mitigate over-fitting in the multi-channel multi-speaker E2E system. Transformer language model fusion is developed to achieve better performance. The final CER is 19.2 and 20.8

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/04/2022

The CUHK-TENCENT speaker diarization system for the ICASSP 2022 multi-channel multi-party meeting transcription challenge

This paper describes our speaker diarization system submitted to the Mul...
research
02/10/2022

The USTC-Ximalaya system for the ICASSP 2022 multi-channel multi-party meeting transcription (M2MeT) challenge

We propose two improvements to target-speaker voice activity detection (...
research
10/07/2022

Mutual Learning of Single- and Multi-Channel End-to-End Neural Diarization

Due to the high performance of multi-channel speech processing, we can u...
research
02/11/2022

The xmuspeech system for multi-channel multi-party meeting transcription challenge

This paper describes the system developed by the XMUSPEECH team for the ...
research
02/03/2022

The RoyalFlush System of Speech Recognition for M2MeT Challenge

This paper describes our RoyalFlush system for the track of multi-speake...
research
09/24/2022

Spatial-aware Speaker Diarization for Multi-channel Multi-party Meeting

This paper describes a spatial-aware speaker diarization system for the ...
research
08/28/2023

The USTC-NERCSLIP Systems for the CHiME-7 DASR Challenge

This technical report details our submission system to the CHiME-7 DASR ...

Please sign up or login with your details

Forgot password? Click here to reset