Parallel Context Windows Improve In-Context Learning of Large Language Models

12/21/2022
by   Nir Ratner, et al.
0

For applications that require processing large amounts of text at inference time, Large Language Models (LLMs) are handicapped by their limited context windows, which are typically 2048 tokens. In-context learning, an emergent phenomenon in LLMs in sizes above a certain parameter threshold, constitutes one significant example because it can only leverage training examples that fit into the context window. Existing efforts to address the context window limitation involve training specialized architectures, which tend to be smaller than the sizes in which in-context learning manifests due to the memory footprint of processing long texts. We present Parallel Context Windows (PCW), a method that alleviates the context window restriction for any off-the-shelf LLM without further training. The key to the approach is to carve a long context into chunks (“windows”) that fit within the architecture, restrict the attention mechanism to apply only within each window, and re-use the positional embeddings among the windows. We test the PCW approach on in-context learning with models that range in size between 750 million and 178 billion parameters, and show substantial improvements for tasks with diverse input and output spaces. Our results motivate further investigation of Parallel Context Windows as a method for applying off-the-shelf LLMs in other settings that require long text sequences.

READ FULL TEXT
research
05/24/2023

Revisiting Parallel Context Windows: A Frustratingly Simple Alternative and Chain-of-Thought Deterioration

We identify two crucial limitations in the evaluation of recent parallel...
research
08/31/2023

YaRN: Efficient Context Window Extension of Large Language Models

Rotary Position Embeddings (RoPE) have been shown to effectively encode ...
research
10/10/2016

A Dynamic Window Neural Network for CCG Supertagging

Combinatory Category Grammar (CCG) supertagging is a task to assign lexi...
research
11/21/2019

An analysis of observation length requirements for machine understanding of human behaviors in spoken language

Machine learning-based human behavior modeling, often at the level of ch...
research
06/27/2023

Extending Context Window of Large Language Models via Positional Interpolation

We present Position Interpolation (PI) that extends the context window s...
research
05/07/2013

Parallel Chen-Han (PCH) Algorithm for Discrete Geodesics

In many graphics applications, the computation of exact geodesic distanc...
research
01/06/2015

Object localization in ImageNet by looking out of the window

We propose a method for annotating the location of objects in ImageNet. ...

Please sign up or login with your details

Forgot password? Click here to reset