pytorch-ocd
Implementation of the Optimal Completion Distillation for Sequence Labeling
view repo
We present Optimal Completion Distillation (OCD), a training procedure for optimizing sequence to sequence models based on edit distance. OCD is efficient, has no hyper-parameters of its own, and does not require pretraining or joint optimization with conditional log-likelihood. Given a partial sequence generated by the model, we first identify the set of optimal suffixes that minimize the total edit distance, using an efficient dynamic programming algorithm. Then, for each position of the generated sequence, we use a target distribution that puts equal probability on the first token of all the optimal suffixes. OCD achieves the state-of-the-art performance on end-to-end speech recognition, on both Wall Street Journal and Librispeech datasets, achieving 9.3% WER and 4.5% WER respectively.
READ FULL TEXT
We investigate the feasibility of sequence-level knowledge distillation ...
read it
In this paper, we present promising accurate prefix boosting (PAPB), a
d...
read it
The recently proposed Sequence-to-Sequence (seq2seq) framework advocates...
read it
We present the Latent Sequence Decompositions (LSD) framework. LSD decom...
read it
Sequence-to-sequence models with soft attention had significant success ...
read it
This paper presents the Imputer, a neural sequence model that generates
...
read it
We provide improved upper bounds for the simultaneous sketching complexi...
read it
Implementation of the Optimal Completion Distillation for Sequence Labeling
Comments
There are no comments yet.