When CTC Training Meets Acoustic Landmarks

11/05/2018
by   Di He, et al.
8

Connectionist temporal classification (CTC) training criterion provides an alternative acoustic model (AM) training strategy for automatic speech recognition in an end-to-end fashion. Although CTC criterion benefits acoustic modeling without needs of time-aligned phonetics transcription, it remains in need of efforts of tweaking to convergence, especially in the resource-constrained scenario. In this paper, we proposed to improve CTC training by incorporating acoustic landmarks. We tailored a new set of acoustic landmarks to help CTC training converge more quickly while also reducing recognition error rates. We leveraged new target label sequences mixed with both phone and manner changes to guide CTC training. Experiments on TIMIT demonstrated that CTC based acoustic models converge faster and smoother significantly when they are augmented by acoustic landmarks. The models pretrained with mixed target labels can be finetuned furthermore, which reduced phone error rate by 8.72 observed on reduced TIMIT and WSJ as well, in which case, we are the first to succeed in testing the effectiveness of acoustic landmark theory on mid-sized ASR tasks.

READ FULL TEXT
research
10/27/2017

Acoustic Landmarks Contain More Information About the Phone String than Other Frames

Most mainstream Automatic Speech Recognition (ASR) systems consider all ...
research
05/15/2018

Improved ASR for Under-Resourced Languages Through Multi-Task Learning with Acoustic Landmarks

Furui first demonstrated that the identity of both consonant and vowel c...
research
04/19/2021

Acoustic Data-Driven Subword Modeling for End-to-End Speech Recognition

Subword units are commonly used for end-to-end automatic speech recognit...
research
10/22/2019

G2G: TTS-Driven Pronunciation Learning for Graphemic Hybrid ASR

Grapheme-based acoustic modeling has recently been shown to outperform p...
research
02/02/2019

Using multi-task learning to improve the performance of acoustic-to-word and conventional hybrid models

Acoustic-to-word (A2W) models that allow direct mapping from acoustic si...
research
03/04/2021

End-to-end acoustic modelling for phone recognition of young readers

Automatic recognition systems for child speech are lagging behind those ...

Please sign up or login with your details

Forgot password? Click here to reset