Optimal Completion Distillation for Sequence Learning

10/02/2018
by   Sara Sabour, et al.
6

We present Optimal Completion Distillation (OCD), a training procedure for optimizing sequence to sequence models based on edit distance. OCD is efficient, has no hyper-parameters of its own, and does not require pretraining or joint optimization with conditional log-likelihood. Given a partial sequence generated by the model, we first identify the set of optimal suffixes that minimize the total edit distance, using an efficient dynamic programming algorithm. Then, for each position of the generated sequence, we use a target distribution that puts equal probability on the first token of all the optimal suffixes. OCD achieves the state-of-the-art performance on end-to-end speech recognition, on both Wall Street Journal and Librispeech datasets, achieving 9.3% WER and 4.5% WER respectively.

READ FULL TEXT
research
11/12/2018

Sequence-Level Knowledge Distillation for Model Compression of Attention-based Sequence-to-Sequence Speech Recognition

We investigate the feasibility of sequence-level knowledge distillation ...
research
11/03/2021

An Improved Algorithm for The k-Dyck Edit Distance Problem

A Dyck sequence is a sequence of opening and closing parentheses (of var...
research
11/07/2018

Promising Accurate Prefix Boosting for sequence-to-sequence ASR

In this paper, we present promising accurate prefix boosting (PAPB), a d...
research
12/08/2016

Towards better decoding and language model integration in sequence to sequence models

The recently proposed Sequence-to-Sequence (seq2seq) framework advocates...
research
10/10/2016

Latent Sequence Decompositions

We present the Latent Sequence Decompositions (LSD) framework. LSD decom...
research
08/03/2016

Learning Online Alignments with Continuous Rewards Policy Gradient

Sequence-to-sequence models with soft attention had significant success ...

Please sign up or login with your details

Forgot password? Click here to reset