Distributionally Robust Language Modeling

09/04/2019
by   Yonatan Oren, et al.
0

Language models are generally trained on data spanning a wide range of topics (e.g., news, reviews, fiction), but they might be applied to an a priori unknown target distribution (e.g., restaurant reviews). In this paper, we first show that training on text outside the test distribution can degrade test performance when using standard maximum likelihood (MLE) training. To remedy this without the knowledge of the test distribution, we propose an approach which trains a model that performs well over a wide range of potential test distributions. In particular, we derive a new distributionally robust optimization (DRO) procedure which minimizes the loss of the model over the worst-case mixture of topics with sufficient overlap with the training distribution. Our approach, called topic conditional value at risk (topic CVaR), obtains a 5.5 point perplexity reduction over MLE when the language models are trained on a mixture of Yelp reviews and news and tested only on reviews.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/04/2020

Birds of a Feather Flock Together: Satirical News Detection via Language Model Differentiation

Satirical news is regularly shared in modern social media because it is ...
research
04/14/2021

UDALM: Unsupervised Domain Adaptation through Language Modeling

In this work we explore Unsupervised Domain Adaptation (UDA) of pretrain...
research
05/20/2023

Re-visiting Automated Topic Model Evaluation with Large Language Models

Topic models are used to make sense of large text collections. However, ...
research
08/04/2021

Mitigating harm in language models with conditional-likelihood filtration

Language models trained on large-scale unfiltered datasets curated from ...
research
10/22/2020

A Disentangled Adversarial Neural Topic Model for Separating Opinions from Plots in User Reviews

The flexibility of the inference process in Variational Autoencoders (VA...
research
02/18/2021

Training Large-Scale News Recommenders with Pretrained Language Models in the Loop

News recommendation calls for deep insights of news articles' underlying...
research
12/11/2021

A Note on the Moments of Special Mixture Distributions, with Applications for Control Charts

Control charts can be applied in a wide range of areas, this paper focus...

Please sign up or login with your details

Forgot password? Click here to reset