UFO2: A unified pre-training framework for online and offline speech recognition

10/26/2022
by   Li Fu, et al.
0

In this paper, we propose a Unified pre-training Framework for Online and Offline (UFO2) Automatic Speech Recognition (ASR), which 1) simplifies the two separate training workflows for online and offline modes into one process, and 2) improves the Word Error Rate (WER) performance with limited utterance annotating. Specifically, we extend the conventional offline-mode Self-Supervised Learning (SSL)-based ASR approach to a unified manner, where the model training is conditioned on both the full-context and dynamic-chunked inputs. To enhance the pre-trained representation model, stop-gradient operation is applied to decouple the online-mode objectives to the quantizer. Moreover, in both the pre-training and the downstream fine-tuning stages, joint losses are proposed to train the unified model with full-weight sharing for the two modes. Experimental results on the LibriSpeech dataset show that UFO2 outperforms the SSL-based baseline method by 29.7 reduction in offline and online modes, respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/15/2021

Improving Accent Identification and Accented Speech Recognition Under a Framework of Self-supervised Learning

Recently, self-supervised pre-training has gained success in automatic s...
research
10/11/2021

K-Wav2vec 2.0: Automatic Speech Recognition based on Joint Decoding of Graphemes and Syllables

Wav2vec 2.0 is an end-to-end framework of self-supervised learning for s...
research
08/21/2023

Pseudo-online framework for BCI evaluation: A MOABB perspective

Objective: BCI (Brain-Computer Interface) technology operates in three m...
research
02/24/2022

Ask2Mask: Guided Data Selection for Masked Speech Modeling

Masked speech modeling (MSM) methods such as wav2vec2 or w2v-BERT learn ...
research
10/05/2021

ASR Rescoring and Confidence Estimation with ELECTRA

In automatic speech recognition (ASR) rescoring, the hypothesis with the...
research
09/08/2022

Goodness of Pronunciation Pipelines for OOV Problem

In the following report we propose pipelines for Goodness of Pronunciati...
research
12/07/2022

Improved Speech Pre-Training with Supervision-Enhanced Acoustic Unit

Speech pre-training has shown great success in learning useful and gener...

Please sign up or login with your details

Forgot password? Click here to reset