Distillation of encoder-decoder transformers for sequence labelling

02/10/2023
by   Marco Farina, et al.
0

Driven by encouraging results on a wide range of tasks, the field of NLP is experiencing an accelerated race to develop bigger language models. This race for bigger models has also underscored the need to continue the pursuit of practical distillation approaches that can leverage the knowledge acquired by these big models in a compute-efficient manner. Having this goal in mind, we build on recent work to propose a hallucination-free framework for sequence tagging that is especially suited for distillation. We show empirical results of new state-of-the-art performance across multiple sequence labelling datasets and validate the usefulness of this framework for distilling a large model in a few-shot learning scenario.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/21/2023

Task-agnostic Distillation of Encoder-Decoder Language Models

Finetuning pretrained language models (LMs) have enabled appealing perfo...
research
07/27/2023

f-Divergence Minimization for Sequence-Level Knowledge Distillation

Knowledge distillation (KD) is the process of transferring knowledge fro...
research
05/21/2020

Evaluating Neural Morphological Taggers for Sanskrit

Neural sequence labelling approaches have achieved state of the art resu...
research
06/14/2023

EM-Network: Oracle Guided Self-distillation for Sequence Learning

We introduce EM-Network, a novel self-distillation approach that effecti...
research
09/16/2019

Hybrid Neural Models For Sequence Modelling: The Best Of Three Worlds

We propose a neural architecture with the main characteristics of the mo...
research
06/04/2021

Churn Reduction via Distillation

In real-world systems, models are frequently updated as more data become...

Please sign up or login with your details

Forgot password? Click here to reset