Using multi-task learning to improve the performance of acoustic-to-word and conventional hybrid models

02/02/2019
by   Thai-Son Nguyen, et al.
0

Acoustic-to-word (A2W) models that allow direct mapping from acoustic signals to word sequences are an appealing approach to end-to-end automatic speech recognition due to their simplicity. However, prior works have shown that modelling A2W typically encounters issues of data sparsity that prevent training such a model directly. So far, pre-training initialization is the only approach proposed to deal with this issue. In this work, we propose to build a shared neural network and optimize A2W and conventional hybrid models in a multi-task manner. Our results show that training an A2W model is much more stable with our multi-task model without pre-training initialization, and results in a significant improvement compared to a baseline model. Experiments also reveal that the performance of a hybrid acoustic model can be further improved when jointly training with a sequence-level optimization criterion such as acoustic-to-word.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/31/2019

Learning Shared Encoding Representation for End-to-End Speech Recognition Models

In this work, we learn a shared encoding representation for a multi-task...
research
02/07/2018

Joint Modeling of Accents and Acoustics for Multi-Accent Speech Recognition

The performance of automatic speech recognition systems degrades with in...
research
10/22/2020

MAM: Masked Acoustic Modeling for End-to-End Speech-to-Text Translation

End-to-end Speech-to-text Translation (E2E- ST), which directly translat...
research
06/05/2023

End-to-End Word-Level Pronunciation Assessment with MASK Pre-training

Pronunciation assessment is a major challenge in the computer-aided pron...
research
03/01/2022

A multi-task learning for cavitation detection and cavitation intensity recognition of valve acoustic signals

With the rapid development of smart manufacturing, data-driven machinery...
research
07/18/2018

Hierarchical Multi Task Learning With CTC

In Automatic Speech Recognition, it is still challenging to learn useful...
research
11/05/2018

When CTC Training Meets Acoustic Landmarks

Connectionist temporal classification (CTC) training criterion provides ...

Please sign up or login with your details

Forgot password? Click here to reset