Multitask Learning with CTC and Segmental CRF for Speech Recognition

02/21/2017
by   Liang Lu, et al.
0

Segmental conditional random fields (SCRFs) and connectionist temporal classification (CTC) are two sequence labeling methods used for end-to-end training of speech recognition models. Both models define a transcription probability by marginalizing decisions about latent segmentation alternatives to derive a sequence probability: the former uses a globally normalized joint model of segment labels and durations, and the latter classifies each frame as either an output symbol or a "continuation" of the previous label. In this paper, we train a recognition model by optimizing an interpolation between the SCRF and CTC losses, where the same recurrent neural network (RNN) encoder is used for feature extraction for both outputs. We find that this multitask objective improves recognition accuracy when decoding with either the SCRF or CTC models. Additionally, we show that CTC can also be used to pretrain the RNN encoder, which improves the convergence rate when learning the joint model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/01/2016

Segmental Recurrent Neural Networks for End-to-end Speech Recognition

We study the segmental recurrent neural network for end-to-end acoustic ...
research
11/02/2020

Multitask Learning and Joint Optimization for Transformer-RNN-Transducer Speech Recognition

Recently, several types of end-to-end speech recognition methods named t...
research
08/01/2017

End-to-End Neural Segmental Models for Speech Recognition

Segmental models are an alternative to frame-based models for sequence p...
research
10/26/2020

Decentralizing Feature Extraction with Quantum Convolutional Neural Network for Automatic Speech Recognition

We propose a novel decentralized feature extraction approach in federate...
research
12/10/2019

A Novel Topology for End-to-end Temporal Classification and Segmentation with Recurrent Neural Network

Connectionist temporal classification (CTC) has matured as an alignment ...
research
11/18/2015

Segmental Recurrent Neural Networks

We introduce segmental recurrent neural networks (SRNNs) which define, g...
research
07/02/2019

Attention model for articulatory features detection

Articulatory distinctive features, as well as phonetic transcription, pl...

Please sign up or login with your details

Forgot password? Click here to reset