Test-Time Training on Nearest Neighbors for Large Language Models

05/29/2023
by   Moritz Hardt, et al.
0

Many recent efforts aim to augment language models with relevant information retrieved from a database at test time. We avoid the need for prompt engineering by directly fine-tuning the model on data retrieved at test time using its standard training setup. For this purpose, we build a large-scale distributed nearest neighbor index based on text embeddings of the Pile dataset. Given a query to a language model, our system retrieves the neighbors of the query and fine-tunes the model on the text data corresponding to those neighbors. Surprisingly, retrieving and training on as few as 20 neighbors, each for only one gradient iteration, drastically improves performance across more than twenty language modeling tasks in the Pile benchmark. For example, test-time training significantly narrows the performance gap between a small GPT2 model and a GPTNeo model, more than ten times larger, that was specifically trained to convergence on the Pile. Sufficient index quality and size, however, are important. Our work establishes a valuable first baseline for implementing test-time training in the context of large language models, opening the door to numerous promising research avenues.

READ FULL TEXT

page 7

page 9

page 15

page 16

page 17

page 18

research
09/09/2021

Efficient Nearest Neighbor Language Models

Non-parametric neural language models (NLMs) learn predictive distributi...
research
05/21/2023

Retrieving Texts based on Abstract Descriptions

In this work, we aim to connect two research areas: instruction models a...
research
10/28/2022

You can't pick your neighbors, or can you? When and how to rely on retrieval in the kNN-LM

Retrieval-enhanced language models (LMs), which condition their predicti...
research
10/27/2021

Training Verifiers to Solve Math Word Problems

State-of-the-art language models can match human performance on many tas...
research
03/29/2022

The Inefficiency of Language Models in Scholarly Retrieval: An Experimental Walk-through

Language models are increasingly becoming popular in AI-powered scientif...
research
01/22/2021

k-Neighbor Based Curriculum Sampling for Sequence Prediction

Multi-step ahead prediction in language models is challenging due to the...
research
09/16/2018

Curriculum-Based Neighborhood Sampling For Sequence Prediction

The task of multi-step ahead prediction in language models is challengin...

Please sign up or login with your details

Forgot password? Click here to reset