Gram-CTC: Automatic Unit Selection and Target Decomposition for Sequence Labelling

03/01/2017
by   Hairong Liu, et al.
0

Most existing sequence labelling models rely on a fixed decomposition of a target sequence into a sequence of basic units. These methods suffer from two major drawbacks: 1) the set of basic units is fixed, such as the set of words, characters or phonemes in speech recognition, and 2) the decomposition of target sequences is fixed. These drawbacks usually result in sub-optimal performance of modeling sequences. In this pa- per, we extend the popular CTC loss criterion to alleviate these limitations, and propose a new loss function called Gram-CTC. While preserving the advantages of CTC, Gram-CTC automatically learns the best set of basic units (grams), as well as the most suitable decomposition of tar- get sequences. Unlike CTC, Gram-CTC allows the model to output variable number of characters at each time step, which enables the model to capture longer term dependency and improves the computational efficiency. We demonstrate that the proposed Gram-CTC improves CTC in terms of both performance and efficiency on the large vocabulary speech recognition task at multiple scales of data, and that with Gram-CTC we can outperform the state-of-the-art on a standard speech benchmark.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/16/2018

A Comparison of Modeling Units in Sequence-to-Sequence Speech Recognition with the Transformer on Mandarin Chinese

The choice of modeling units is critical to automatic speech recognition...
research
07/13/2018

Hybrid CTC-Attention based End-to-End Speech Recognition using Subword Units

In this paper, we present an end-to-end automatic speech recognition sys...
research
05/28/2020

Subword RNNLM Approximations for Out-Of-Vocabulary Keyword Search

In spoken Keyword Search, the query may contain out-of-vocabulary (OOV) ...
research
01/06/2022

Improving Mandarin End-to-End Speech Recognition with Word N-gram Language Model

Despite the rapid progress of end-to-end (E2E) automatic speech recognit...
research
02/08/2022

Differentiable N-gram Objective on Abstractive Summarization

ROUGE is a standard automatic evaluation metric based on n-grams for seq...
research
11/20/2018

WEST: Word Encoded Sequence Transducers

Most of the parameters in large vocabulary models are used in embedding ...
research
12/28/2020

Enhancing Handwritten Text Recognition with N-gram sequence decomposition and Multitask Learning

Current state-of-the-art approaches in the field of Handwritten Text Rec...

Please sign up or login with your details

Forgot password? Click here to reset