The USTC-NERCSLIP Systems for the CHiME-7 DASR Challenge

08/28/2023
by   Ruoyu Wang, et al.
0

This technical report details our submission system to the CHiME-7 DASR Challenge, which focuses on speaker diarization and speech recognition under complex multi-speaker settings. Additionally, it also evaluates the efficiency of systems in handling diverse array devices. To address these issues, we implemented an end-to-end speaker diarization system and introduced a rectification strategy based on multi-channel spatial information. This approach significantly diminished the word error rates (WER). In terms of recognition, we utilized publicly available pre-trained models as the foundational models to train our end-to-end speech recognition models. Our system attained a macro-averaged diarization-attributed WER (DA-WER) of 22.4% on the CHiME-7 development set, which signifies a relative improvement of 52.5% over the official baseline system.

READ FULL TEXT

page 1

page 2

page 3

research
06/19/2020

Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of Any Number of Speakers

We propose an end-to-end speaker-attributed automatic speech recognition...
research
10/15/2019

MIMO-SPEECH: End-to-End Multi-Channel Multi-Speaker Speech Recognition

Recently, the end-to-end approach has proven its efficacy in monaural mu...
research
11/03/2020

Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR

Recently, an end-to-end speaker-attributed automatic speech recognition ...
research
02/09/2022

The Volcspeech system for the ICASSP 2022 multi-channel multi-party meeting transcription challenge

This paper describes our submission to ICASSP 2022 Multi-channel Multi-p...
research
07/07/2019

NIESR: Nuisance Invariant End-to-end Speech Recognition

Deep neural network models for speech recognition have achieved great su...
research
06/18/2023

SURT 2.0: Advances in Transducer-based Multi-talker Speech Recognition

The Streaming Unmixing and Recognition Transducer (SURT) model was propo...
research
02/27/2021

Silent versus modal multi-speaker speech recognition from ultrasound and video

We investigate multi-speaker speech recognition from ultrasound images o...

Please sign up or login with your details

Forgot password? Click here to reset