The THUEE System Description for the IARPA OpenASR21 Challenge

06/29/2022
by   Jing Zhao, et al.
0

This paper describes the THUEE team's speech recognition system for the IARPA Open Automatic Speech Recognition Challenge (OpenASR21), with further experiment explorations. We achieve outstanding results under both the Constrained and Constrained-plus training conditions. For the Constrained training condition, we construct our basic ASR system based on the standard hybrid architecture. To alleviate the Out-Of-Vocabulary (OOV) problem, we extend the pronunciation lexicon using Grapheme-to-Phoneme (G2P) techniques for both OOV and potential new words. Standard acoustic model structures such as CNN-TDNN-F and CNN-TDNN-F-A are adopted. In addition, multiple data augmentation techniques are applied. For the Constrained-plus training condition, we use the self-supervised learning framework wav2vec2.0. We experiment with various fine-tuning techniques with the Connectionist Temporal Classification (CTC) criterion on top of the publicly available pre-trained model XLSR-53. We find that the frontend feature extractor plays an important role when applying the wav2vec2.0 pre-trained model to the encoder-decoder based CTC/Attention ASR architecture. Extra improvements can be achieved by using the CTC model finetuned in the target language as the frontend feature extractor.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/23/2021

Data Augmentation based Consistency Contrastive Pre-training for Automatic Speech Recognition

Self-supervised acoustic pre-training has achieved amazing results on th...
research
10/11/2021

K-Wav2vec 2.0: Automatic Speech Recognition based on Joint Decoding of Graphemes and Syllables

Wav2vec 2.0 is an end-to-end framework of self-supervised learning for s...
research
03/14/2023

Improving Accented Speech Recognition with Multi-Domain Training

Thanks to the rise of self-supervised learning, automatic speech recogni...
research
06/09/2022

Joint Encoder-Decoder Self-Supervised Pre-training for ASR

Self-supervised learning (SSL) has shown tremendous success in various s...
research
06/01/2023

Some voices are too common: Building fair speech recognition systems using the Common Voice dataset

Automatic speech recognition (ASR) systems become increasingly efficient...
research
04/11/2022

Multistream neural architectures for cued-speech recognition using a pre-trained visual feature extractor and constrained CTC decoding

This paper proposes a simple and effective approach for automatic recogn...
research
09/18/2023

Training dynamic models using early exits for automatic speech recognition on resource-constrained devices

The possibility of dynamically modifying the computational load of neura...

Please sign up or login with your details

Forgot password? Click here to reset