
SequenceLevel Knowledge Distillation for Model Compression of Attentionbased SequencetoSequence Speech Recognition
We investigate the feasibility of sequencelevel knowledge distillation ...
read it

Promising Accurate Prefix Boosting for sequencetosequence ASR
In this paper, we present promising accurate prefix boosting (PAPB), a d...
read it

Towards better decoding and language model integration in sequence to sequence models
The recently proposed SequencetoSequence (seq2seq) framework advocates...
read it

Latent Sequence Decompositions
We present the Latent Sequence Decompositions (LSD) framework. LSD decom...
read it

Indexed Dynamic Programming to boost Edit Distance and LCSS Computation
There are efficient dynamic programming solutions to the computation of ...
read it

Learning Online Alignments with Continuous Rewards Policy Gradient
Sequencetosequence models with soft attention had significant success ...
read it

TokenLevel Ensemble Distillation for GraphemetoPhoneme Conversion
Graphemetophoneme (G2P) conversion is an important task in automatic s...
read it
Optimal Completion Distillation for Sequence Learning
We present Optimal Completion Distillation (OCD), a training procedure for optimizing sequence to sequence models based on edit distance. OCD is efficient, has no hyperparameters of its own, and does not require pretraining or joint optimization with conditional loglikelihood. Given a partial sequence generated by the model, we first identify the set of optimal suffixes that minimize the total edit distance, using an efficient dynamic programming algorithm. Then, for each position of the generated sequence, we use a target distribution that puts equal probability on the first token of all the optimal suffixes. OCD achieves the stateoftheart performance on endtoend speech recognition, on both Wall Street Journal and Librispeech datasets, achieving 9.3% WER and 4.5% WER respectively.
READ FULL TEXT
Comments
There are no comments yet.