Comparing CTC and LFMMI for out-of-domain adaptation of wav2vec 2.0 acoustic model

04/06/2021
by   Apoorv Vyas, et al.
0

In this work, we investigate if the wav2vec 2.0 self-supervised pretraining helps mitigate the overfitting issues with connectionist temporal classification (CTC) training to reduce its performance gap with flat-start lattice-free MMI (E2E-LFMMI) for automatic speech recognition with limited training data. Towards that objective, we use the pretrained wav2vec 2.0 BASE model and fine-tune it on three different datasets including out-of-domain (Switchboard) and cross-lingual (Babel) scenarios. Our results show that for supervised adaptation of the wav2vec 2.0 model, both E2E-LFMMI and CTC achieve similar results; significantly outperforming the baselines trained only with supervised data. Fine-tuning the wav2vec 2.0 model with E2E-LFMMI and CTC we obtain the following relative WER improvements over the supervised baseline trained with E2E-LFMMI. We get relative improvements of 40 clean-set and 64 On Switchboard (300h) we obtain relative improvements of 33 respectively. Finally, for Babel languages, we obtain relative improvements of 26

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/28/2020

Lattice-Free MMI Adaptation Of Self-Supervised Pretrained Acoustic Models

In this work, we propose lattice-free MMI (LFMMI) for supervised adaptat...
research
03/14/2023

Improving Accented Speech Recognition with Multi-Domain Training

Thanks to the rise of self-supervised learning, automatic speech recogni...
research
10/10/2021

Injecting Text and Cross-lingual Supervision in Few-shot Learning from Self-Supervised Models

Self-supervised model pre-training has recently garnered significant int...
research
06/16/2022

DRAFT: A Novel Framework to Reduce Domain Shifting in Self-supervised Learning and Its Application to Children's ASR

Self-supervised learning (SSL) in the pretraining stage using un-annotat...
research
11/14/2022

Improving Children's Speech Recognition by Fine-tuning Self-supervised Adult Speech Representations

Children's speech recognition is a vital, yet largely overlooked domain ...
research
08/14/2023

O-1: Self-training with Oracle and 1-best Hypothesis

We introduce O-1, a new self-training objective to reduce training bias ...

Please sign up or login with your details

Forgot password? Click here to reset