Distributionally Robust Recurrent Decoders with Random Network Distillation

Neural machine learning models can successfully model language that is similar to their training distribution, but they are highly susceptible to degradation under distribution shift, which occurs in many practical applications when processing out-of-domain (OOD) text. This has been attributed to "shortcut learning": relying on weak correlations over arbitrary large contexts. We propose a method based on OOD detection with Random Network Distillation to allow an autoregressive language model to automatically disregard OOD context during inference, smoothly transitioning towards a less expressive but more robust model as the data becomes more OOD while retaining its full context capability when operating in-distribution. We apply our method to a GRU architecture, demonstrating improvements on multiple language modeling (LM) datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/08/2019

Knowledge Distillation For Recurrent Neural Network Language Modeling With Trust Regularization

Recurrent Neural Networks (RNNs) have dominated language modeling becaus...
research
05/22/2023

Improving Robustness in Knowledge Distillation Using Domain-Targeted Data Augmentation

Applying knowledge distillation encourages a student model to behave mor...
research
01/12/2023

A Cohesive Distillation Architecture for Neural Language Models

A recent trend in Natural Language Processing is the exponential growth ...
research
12/20/2022

In-context Learning Distillation: Transferring Few-shot Learning Ability of Pre-trained Language Models

Given the success with in-context learning of large pre-trained language...
research
09/22/2021

Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network

Incremental text-to-speech (TTS) synthesis generates utterances in small...
research
12/03/2019

Unsupervised Inflection Generation Using Neural Language Modeling

The use of Deep Neural Network architectures for Language Modeling has r...
research
05/26/2022

Training and Inference on Any-Order Autoregressive Models the Right Way

Conditional inference on arbitrary subsets of variables is a core proble...

Please sign up or login with your details

Forgot password? Click here to reset