Accurate Knowledge Distillation with n-best Reranking

05/20/2023
by   Hendra Setiawan, et al.
0

We propose extending the Sequence-level Knowledge Distillation (Kim and Rush, 2016) with n-best reranking to consider not only the top-1 hypotheses but also the top n-best hypotheses of teacher models. Our approach leverages a diverse set of models, including publicly-available large pretrained models, to provide more accurate pseudo-labels for training student models. We validate our proposal on the WMT21 German-English translation task and demonstrate that our student model achieves comparable accuracy to a large translation model with 4.7 billion parameters from (Tran et al., 2021) while having two orders of magnitude fewer parameters.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/25/2016

Sequence-Level Knowledge Distillation

Neural machine translation (NMT) offers a novel alternative formulation ...
research
11/12/2018

Sequence-Level Knowledge Distillation for Model Compression of Attention-based Sequence-to-Sequence Speech Recognition

We investigate the feasibility of sequence-level knowledge distillation ...
research
05/08/2023

Web Content Filtering through knowledge distillation of Large Language Models

We introduce a state-of-the-art approach for URL categorization that lev...
research
03/24/2022

Ensembling and Knowledge Distilling of Large Sequence Taggers for Grammatical Error Correction

In this paper, we investigate improvements to the GEC sequence tagging a...
research
03/27/2023

Improving Neural Topic Models with Wasserstein Knowledge Distillation

Topic modeling is a dominant method for exploring document collections o...
research
01/20/2021

Deep Epidemiological Modeling by Black-box Knowledge Distillation: An Accurate Deep Learning Model for COVID-19

An accurate and efficient forecasting system is imperative to the preven...
research
05/14/2023

Towards Understanding and Improving Knowledge Distillation for Neural Machine Translation

Knowledge distillation (KD) is a promising technique for model compressi...

Please sign up or login with your details

Forgot password? Click here to reset