Regularized Training of Nearest Neighbor Language Models

09/16/2021
by   Jean-Francois Ton, et al.
0

Including memory banks in a natural language processing architecture increases model capacity by equipping it with additional data at inference time. In this paper, we build upon kNN-LM <cit.>, which uses a pre-trained language model together with an exhaustive kNN search through the training data (memory bank) to achieve state-of-the-art results. We investigate whether we can improve the kNN-LM performance by instead training a LM with the knowledge that we will be using a kNN post-hoc. We achieved significant improvement using our method on language modeling tasks on and . The main phenomenon that we encounter is that adding a simple L2 regularization on the activations (not weights) of the model, a transformer, improves the post-hoc kNN classification performance. We explore some possible reasons for this improvement. In particular, we find that the added L2 regularization seems to improve the performance for high-frequency words without deteriorating the performance for low frequency ones.

READ FULL TEXT
research
11/01/2019

Generalization through Memorization: Nearest Neighbor Language Models

We introduce kNN-LMs, which extend a pre-trained neural language model (...
research
04/04/2023

Unsupervised Improvement of Factual Knowledge in Language Models

Masked language modeling (MLM) plays a key role in pretraining large lan...
research
02/21/2021

Automatic Code Generation using Pre-Trained Language Models

Recent advancements in natural language processing <cit.> <cit.> have le...
research
12/03/2022

Global memory transformer for processing long documents

Transformer variants dominate the state-of-the-art in different natural ...
research
10/17/2022

Pseudo-OOD training for robust language models

While pre-trained large-scale deep models have garnered attention as an ...
research
02/03/2022

Pre-Trained Language Models for Interactive Decision-Making

Language model (LM) pre-training has proven useful for a wide variety of...
research
04/20/2021

Modeling Event Plausibility with Consistent Conceptual Abstraction

Understanding natural language requires common sense, one aspect of whic...

Please sign up or login with your details

Forgot password? Click here to reset