Phonetic-assisted Multi-Target Units Modeling for Improving Conformer-Transducer ASR system

11/03/2022
by   Li Li, et al.
0

Exploiting effective target modeling units is very important and has always been a concern in end-to-end automatic speech recognition (ASR). In this work, we propose a phonetic-assisted multi-target units (PMU) modeling approach, to enhance the Conformer-Transducer ASR system in a progressive representation learning manner. Specifically, PMU first uses the pronunciation-assisted subword modeling (PASM) and byte pair encoding (BPE) to produce phonetic-induced and text-induced target units separately; Then, three new frameworks are investigated to enhance the acoustic encoder, including a basic PMU, a paraCTC and a pcaCTC, they integrate the PASM and BPE units at different levels for CTC and transducer multi-task training. Experiments on both LibriSpeech and accented ASR tasks show that, the proposed PMU significantly outperforms the conventional BPE, it reduces the WER of LibriSpeech clean, other, and six accented ASR testsets by relative 12.7 respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/19/2021

Acoustic Data-Driven Subword Modeling for End-to-End Speech Recognition

Subword units are commonly used for end-to-end automatic speech recognit...
research
07/29/2022

Pronunciation-aware unique character encoding for RNN Transducer-based Mandarin speech recognition

For Mandarin end-to-end (E2E) automatic speech recognition (ASR) tasks, ...
research
05/24/2022

Multi-Level Modeling Units for End-to-End Mandarin Speech Recognition

The choice of modeling units affects the performance of the acoustic mod...
research
12/05/2021

Consistent Training and Decoding For End-to-end Speech Recognition Using Lattice-free MMI

Recently, End-to-End (E2E) frameworks have achieved remarkable results o...
research
10/23/2021

Optimizing Alignment of Speech and Language Latent Spaces for End-to-End Speech Recognition and Understanding

The advances in attention-based encoder-decoder (AED) networks have brou...
research
10/18/2019

End-to-End Speech Recognition: A review for the French Language

Recently, end-to-end ASR based either on sequence-to-sequence networks o...
research
07/09/2021

On lattice-free boosted MMI training of HMM and CTC-based full-context ASR models

Hybrid automatic speech recognition (ASR) models are typically sequentia...

Please sign up or login with your details

Forgot password? Click here to reset