Two-pass Decoding and Cross-adaptation Based System Combination of End-to-end Conformer and Hybrid TDNN ASR Systems

06/23/2022
by   Mingyu Cui, et al.
0

Fundamental modelling differences between hybrid and end-to-end (E2E) automatic speech recognition (ASR) systems create large diversity and complementarity among them. This paper investigates multi-pass rescoring and cross adaptation based system combination approaches for hybrid TDNN and Conformer E2E ASR systems. In multi-pass rescoring, state-of-the-art hybrid LF-MMI trained CNN-TDNN system featuring speed perturbation, SpecAugment and Bayesian learning hidden unit contributions (LHUC) speaker adaptation was used to produce initial N-best outputs before being rescored by the speaker adapted Conformer system using a 2-way cross system score interpolation. In cross adaptation, the hybrid CNN-TDNN system was adapted to the 1-best output of the Conformer system or vice versa. Experiments on the 300-hour Switchboard corpus suggest that the combined systems derived using either of the two system combination approaches outperformed the individual systems. The best combined system obtained using multi-pass rescoring produced statistically significant word error rate (WER) reductions of 2.5 relative) over the stand alone Conformer system on the NIST Hub5'00, Rt03 and Rt02 evaluation data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/17/2022

Unsupervised Model-based speaker adaptation of end-to-end lattice-free MMI model for speech recognition

Modeling the speaker variability is a key challenge for automatic speech...
research
12/21/2015

The 2015 Sheffield System for Transcription of Multi-Genre Broadcast Media

We describe the University of Sheffield system for participation in the ...
research
01/04/2019

Speaker Adaptation for End-to-End CTC Models

We propose two approaches for speaker adaptation in end-to-end (E2E) aut...
research
07/08/2019

Listen, Attend, Spell and Adapt: Speaker Adapted Sequence-to-Sequence ASR

Sequence-to-sequence (seq2seq) based ASR systems have shown state-of-the...
research
03/28/2022

On-the-fly Feature Based Speaker Adaptation for Dysarthric and Elderly Speech Recognition

Automatic recognition of dysarthric and elderly speech highly challengin...
research
07/03/2019

End-to-End Speech Recognition with High-Frame-Rate Features Extraction

State-of-the-art end-to-end automatic speech recognition (ASR) extracts ...
research
06/15/2022

Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging Features For Elderly And Dysarthric Speech Recognition

Articulatory features are inherently invariant to acoustic signal distor...

Please sign up or login with your details

Forgot password? Click here to reset