Optimal Completion Distillation for Sequence Learning

10/02/2018
by   Sara Sabour, et al.
6

We present Optimal Completion Distillation (OCD), a training procedure for optimizing sequence to sequence models based on edit distance. OCD is efficient, has no hyper-parameters of its own, and does not require pretraining or joint optimization with conditional log-likelihood. Given a partial sequence generated by the model, we first identify the set of optimal suffixes that minimize the total edit distance, using an efficient dynamic programming algorithm. Then, for each position of the generated sequence, we use a target distribution that puts equal probability on the first token of all the optimal suffixes. OCD achieves the state-of-the-art performance on end-to-end speech recognition, on both Wall Street Journal and Librispeech datasets, achieving 9.3% WER and 4.5% WER respectively.

READ FULL TEXT
11/12/2018

Sequence-Level Knowledge Distillation for Model Compression of Attention-based Sequence-to-Sequence Speech Recognition

We investigate the feasibility of sequence-level knowledge distillation ...
11/03/2021

An Improved Algorithm for The k-Dyck Edit Distance Problem

A Dyck sequence is a sequence of opening and closing parentheses (of var...
11/07/2018

Promising Accurate Prefix Boosting for sequence-to-sequence ASR

In this paper, we present promising accurate prefix boosting (PAPB), a d...
12/08/2016

Towards better decoding and language model integration in sequence to sequence models

The recently proposed Sequence-to-Sequence (seq2seq) framework advocates...
10/10/2016

Latent Sequence Decompositions

We present the Latent Sequence Decompositions (LSD) framework. LSD decom...
08/03/2016

Learning Online Alignments with Continuous Rewards Policy Gradient

Sequence-to-sequence models with soft attention had significant success ...
12/19/2018

Dynamic Programming Approach to Template-based OCR

In this paper we propose a dynamic programming solution to the template-...

Code Repositories

pytorch-ocd

Implementation of the Optimal Completion Distillation for Sequence Labeling


view repo