Hierarchical Multitask Learning for CTC-based Speech Recognition

07/17/2018
by   Kalpesh Krishna, et al.
0

Previous work has shown that neural encoder-decoder speech recognition can be improved with hierarchical multitask learning, where auxiliary tasks are added at intermediate layers of a deep encoder. We explore the effect of hierarchical multitask learning in the context of connectionist temporal classification (CTC)-based speech recognition, and investigate several aspects of this approach. Consistent with previous work, we observe performance improvements on telephone conversational speech recognition (specifically the Eval2000 test sets) when training a subword-level CTC model with an auxiliary phone loss at an intermediate layer. We analyze the effects of a number of experimental variables (like interpolation constant and position of the auxiliary loss function), performance in lower-resource settings, and the relationship between pretraining and multitask learning. We observe that the hierarchical multitask approach improves over standard multitask training in our higher-data experiments, while in the low-resource settings standard multitask training works well. The best results are obtained by combining hierarchical multitask learning and pretraining, which improves word error rates by 3.4 the Eval2000 test sets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/05/2017

Multitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder Based Speech Recognition

End-to-end training of deep learning-based models allows for implicit le...
research
11/24/2022

Multitask Learning for Low Resource Spoken Language Understanding

We explore the benefits that multitask learning offer to speech processi...
research
02/19/2018

Tied Multitask Learning for Neural Speech Translation

We explore multitask models for neural translation of speech, augmenting...
research
12/07/2016

When is multitask learning effective? Semantic sequence prediction under varying data conditions

Multitask learning has been applied successfully to a range of tasks, mo...
research
05/21/2020

Worse WER, but Better BLEU? Leveraging Word Embedding as Intermediate in Multitask End-to-End Speech Translation

Speech translation (ST) aims to learn transformations from speech in the...
research
12/02/2022

ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning

Pretraining has been shown to scale well with compute, data size and dat...
research
12/19/2019

Neural Simile Recognition with Cyclic Multitask Learning and Local Attention

Simile recognition is to detect simile sentences and to extract simile c...

Please sign up or login with your details

Forgot password? Click here to reset