MBR and QE Finetuning: Training-time Distillation of the Best and Most Expensive Decoding Methods

09/19/2023
by   Mara Finkelstein, et al.
0

Recent research in decoding methods for Natural Language Generation (NLG) tasks has shown that the traditional beam search and greedy decoding algorithms are not optimal, because model probabilities do not always align with human preferences. Stronger decoding methods, including Quality Estimation (QE) reranking and Minimum Bayes' Risk (MBR) decoding, have since been proposed to mitigate the model-perplexity-vs-quality mismatch. While these decoding methods achieve state-of-the-art performance, they are prohibitively expensive to compute. In this work, we propose MBR finetuning and QE finetuning which distill the quality gains from these decoding methods at training time, while using an efficient decoding algorithm at inference time. Using the canonical NLG task of Neural Machine Translation (NMT), we show that even with self-training, these finetuning methods significantly outperform the base model. Moreover, when using an external LLM as a teacher model, these finetuning methods outperform finetuning on human-generated references. These findings suggest new ways to leverage monolingual data to achieve improvements in model quality that are on par with, or even exceed, improvements from human-curated data, while maintaining maximum efficiency during decoding.

READ FULL TEXT
research
05/02/2022

Quality-Aware Decoding for Neural Machine Translation

Despite the progress in machine translation quality estimation and evalu...
research
04/11/2017

Later-stage Minimum Bayes-Risk Decoding for Neural Machine Translation

For extended periods of time, sequence generation models rely on beam se...
research
05/20/2020

Is MAP Decoding All You Need? The Inadequacy of the Mode in Neural Machine Translation

Recent studies have revealed a number of pathologies of neural machine t...
research
05/17/2023

Epsilon Sampling Rocks: Investigating Sampling Strategies for Minimum Bayes Risk Decoding for Machine Translation

Recent advances in machine translation (MT) have shown that Minimum Baye...
research
05/18/2021

Understanding the Properties of Minimum Bayes Risk Decoding in Neural Machine Translation

Neural Machine Translation (NMT) currently exhibits biases such as produ...
research
06/08/2023

Improving Language Model Integration for Neural Machine Translation

The integration of language models for neural machine translation has be...
research
08/18/2021

SHAQ: Single Headed Attention with Quasi-Recurrence

Natural Language Processing research has recently been dominated by larg...

Please sign up or login with your details

Forgot password? Click here to reset