Efficient Nearest Neighbor Language Models

09/09/2021
by   Junxian He, et al.
0

Non-parametric neural language models (NLMs) learn predictive distributions of text utilizing an external datastore, which allows them to learn through explicitly memorizing the training datapoints. While effective, these models often require retrieval from a large datastore at test time, significantly increasing the inference overhead and thus limiting the deployment of non-parametric NLMs in practical applications. In this paper, we take the recently proposed k-nearest neighbors language model (Khandelwal et al., 2019) as an example, exploring methods to improve its efficiency along various dimensions. Experiments on the standard WikiText-103 benchmark and domain-adaptation datasets show that our methods are able to achieve up to a 6x speed-up in inference speed while retaining comparable performance. The empirical analysis we present may provide guidelines for future research seeking to develop or deploy more efficient non-parametric NLMs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/29/2023

Test-Time Training on Nearest Neighbors for Large Language Models

Many recent efforts aim to augment language models with relevant informa...
research
01/07/2023

Why do Nearest Neighbor Language Models Work?

Language models (LMs) compute the probability of a text by sequentially ...
research
09/13/2022

Non-Parametric Temporal Adaptation for Social Media Topic Classification

User-generated social media data is constantly changing as new trends in...
research
10/06/2021

Capturing Structural Locality in Non-parametric Language Models

Structural locality is a ubiquitous feature of real-world datasets, wher...
research
11/15/2022

Adaptation Approaches for Nearest Neighbor Language Models

Semi-parametric Nearest Neighbor Language Models (kNN-LMs) have produced...
research
03/13/2020

When are Non-Parametric Methods Robust?

A growing body of research has shown that many classifiers are susceptib...
research
06/29/2020

Learning Sparse Prototypes for Text Generation

Prototype-driven text generation uses non-parametric models that first c...

Please sign up or login with your details

Forgot password? Click here to reset