EM-Network: Oracle Guided Self-distillation for Sequence Learning

06/14/2023
by   Ji Won Yoon, et al.
0

We introduce EM-Network, a novel self-distillation approach that effectively leverages target information for supervised sequence-to-sequence (seq2seq) learning. In contrast to conventional methods, it is trained with oracle guidance, which is derived from the target sequence. Since the oracle guidance compactly represents the target-side context that can assist the sequence model in solving the task, the EM-Network achieves a better prediction compared to using only the source input. To allow the sequence model to inherit the promising capability of the EM-Network, we propose a new self-distillation strategy, where the original sequence model can benefit from the knowledge of the EM-Network in a one-stage manner. We conduct comprehensive experiments on two types of seq2seq models: connectionist temporal classification (CTC) for speech recognition and attention-based encoder-decoder (AED) for machine translation. Experimental results demonstrate that the EM-Network significantly advances the current state-of-the-art approaches, improving over the best prior work on speech recognition and establishing state-of-the-art performance on WMT'14 and IWSLT'14.

READ FULL TEXT

page 8

page 15

research
12/04/2019

Integrating Whole Context to Sequence-to-sequence Speech Recognition

Because an attention based sequence-to-sequence speech (Seq2Seq) recogni...
research
11/05/2021

Oracle Teacher: Towards Better Knowledge Distillation

Knowledge distillation (KD), best known as an effective method for model...
research
01/22/2019

Self-Attention Networks for Connectionist Temporal Classification in Speech Recognition

Self-attention has demonstrated great success in sequence-to-sequence ta...
research
12/09/2020

On Knowledge Distillation for Direct Speech Translation

Direct speech translation (ST) has shown to be a complex task requiring ...
research
02/10/2023

Distillation of encoder-decoder transformers for sequence labelling

Driven by encouraging results on a wide range of tasks, the field of NLP...
research
02/05/2019

Model Unit Exploration for Sequence-to-Sequence Speech Recognition

We evaluate attention-based encoder-decoder models along two dimensions:...
research
02/06/2018

Multi-Temporal Land Cover Classification with Sequential Recurrent Encoders

Earth observation (EO) sensors deliver data with daily or weekly tempora...

Please sign up or login with your details

Forgot password? Click here to reset