Knowledge Distillation for Quality Estimation

07/01/2021
by   Amit Gajbhiye, et al.
6

Quality Estimation (QE) is the task of automatically predicting Machine Translation quality in the absence of reference translations, making it applicable in real-time settings, such as translating online social media conversations. Recent success in QE stems from the use of multilingual pre-trained representations, where very large models lead to impressive results. However, the inference time, disk and memory requirements of such models do not allow for wide usage in the real world. Models trained on distilled pre-trained representations remain prohibitively large for many usage scenarios. We instead propose to directly transfer knowledge from a strong QE teacher model to a much smaller model with a different, shallower architecture. We show that this approach, in combination with data augmentation, leads to light-weight QE models that perform competitively with distilled pre-trained representations with 8x fewer parameters.

READ FULL TEXT
research
10/24/2020

Pre-trained Summarization Distillation

Recent state-of-the-art approaches to summarization utilize large pre-tr...
research
06/11/2021

RefBERT: Compressing BERT by Referencing to Pre-computed Representations

Recently developed large pre-trained language models, e.g., BERT, have a...
research
10/14/2020

Weight Squeezing: Reparameterization for Compression and Fast Inference

In this work, we present a novel approach for simultaneous knowledge tra...
research
01/21/2021

Distilling Large Language Models into Tiny and Effective Students using pQRNN

Large pre-trained multilingual models like mBERT, XLM-R achieve state of...
research
09/14/2023

CoLLD: Contrastive Layer-to-layer Distillation for Compressing Multilingual Pre-trained Speech Encoders

Large-scale self-supervised pre-trained speech encoders outperform conve...
research
12/05/2021

Safe Distillation Box

Knowledge distillation (KD) has recently emerged as a powerful strategy ...
research
12/29/2020

Accelerating Pre-trained Language Models via Calibrated Cascade

Dynamic early exiting aims to accelerate pre-trained language models' (P...

Please sign up or login with your details

Forgot password? Click here to reset