Multi-sequence Intermediate Conditioning for CTC-based ASR

04/01/2022
by   Yusuke Fujita, et al.
0

End-to-end automatic speech recognition (ASR) directly maps input speech to a character sequence without using pronunciation lexica. However, in languages with thousands of characters, such as Japanese and Mandarin, modeling all these characters is problematic due to data scarcity. To alleviate the problem, we propose a multi-task learning model with explicit interaction between characters and syllables by utilizing Self-conditioned connectionist temporal classification (CTC) technique. While the original Self-conditioned CTC estimates character-level intermediate predictions by applying auxiliary CTC losses to a set of intermediate layers, the proposed method additionally estimates syllable-level intermediate predictions in another set of intermediate layers. The character-level and syllable-level predictions are alternately used as conditioning features to deal with mutual dependency between characters and syllables. Experimental results on Japanese and Mandarin datasets show that the proposed multi-sequence intermediate conditioning outperformed the conventional multi-task-based and Self-conditioned CTC-based methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/01/2022

Better Intermediates Improve CTC Inference

This paper proposes a method for improved CTC inference with searched in...
research
04/01/2022

InterAug: Augmenting Noisy Intermediate Predictions for CTC-based ASR

This paper proposes InterAug: a novel training method for CTC-based ASR ...
research
07/18/2018

Hierarchical Multi Task Learning With CTC

In Automatic Speech Recognition, it is still challenging to learn useful...
research
10/08/2021

Hierarchical Conditional End-to-End ASR with CTC and Multi-Granular Subword Units

In end-to-end automatic speech recognition (ASR), a model is expected to...
research
11/28/2018

On the Inductive Bias of Word-Character-Level Multi-Task Learning for Speech Recognition

End-to-end automatic speech recognition (ASR) commonly transcribes audio...
research
05/24/2022

Multi-Level Modeling Units for End-to-End Mandarin Speech Recognition

The choice of modeling units affects the performance of the acoustic mod...
research
05/16/2020

Reducing Spelling Inconsistencies in Code-Switching ASR using Contextualized CTC Loss

Code-Switching (CS) remains a challenge for Automatic Speech Recognition...

Please sign up or login with your details

Forgot password? Click here to reset