Why do Nearest Neighbor Language Models Work?

01/07/2023
by   Frank F. Xu, et al.
0

Language models (LMs) compute the probability of a text by sequentially computing a representation of an already-seen context and using this representation to predict the next word. Currently, most LMs calculate these representations through a neural network consuming the immediate previous context. However recently, retrieval-augmented LMs have shown to improve over standard neural LMs, by accessing information retrieved from a large datastore, in addition to their standard, parametric, next-word prediction. In this paper, we set out to understand why retrieval-augmented language models, and specifically why k-nearest neighbor language models (kNN-LMs) perform better than standard parametric LMs, even when the k-nearest neighbor component retrieves examples from the same training set that the LM was originally trained on. To this end, we perform a careful analysis of the various dimensions over which kNN-LM diverges from standard LMs, and investigate these dimensions one by one. Empirically, we identify three main reasons why kNN-LM performs better than standard LMs: using a different input representation for predicting the next tokens, approximate kNN search, and the importance of softmax temperature for the kNN distribution. Further, we incorporate these insights into the model architecture or the training procedure of the standard parametric LM, improving its results without the need for an explicit retrieval component. The code is available at https://github.com/frankxu2004/knnlm-why.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/09/2021

Efficient Nearest Neighbor Language Models

Non-parametric neural language models (NLMs) learn predictive distributi...
research
11/01/2019

Generalization through Memorization: Nearest Neighbor Language Models

We introduce kNN-LMs, which extend a pre-trained neural language model (...
research
11/15/2022

Adaptation Approaches for Nearest Neighbor Language Models

Semi-parametric Nearest Neighbor Language Models (kNN-LMs) have produced...
research
11/07/2017

Unbounded cache model for online language modeling with open vocabulary

Recently, continuous cache models were proposed as extensions to recurre...
research
08/25/2017

k-Nearest Neighbor Augmented Neural Networks for Text Classification

In recent years, many deep-learning based models are proposed for text c...
research
12/11/2021

SLOSH: Set LOcality Sensitive Hashing via Sliced-Wasserstein Embeddings

Learning from set-structured data is an essential problem with many appl...
research
05/21/2023

Retrieving Texts based on Abstract Descriptions

In this work, we aim to connect two research areas: instruction models a...

Please sign up or login with your details

Forgot password? Click here to reset