YaRN: Efficient Context Window Extension of Large Language Models

08/31/2023
by   Bowen Peng, et al.
0

Rotary Position Embeddings (RoPE) have been shown to effectively encode positional information in transformer-based language models. However, these models fail to generalize past the sequence length they were trained on. We present YaRN (Yet another RoPE extensioN method), a compute-efficient method to extend the context window of such models, requiring 10x less tokens and 2.5x less training steps than previous methods. Using YaRN, we show that LLaMA models can effectively utilize and extrapolate to context lengths much longer than their original pre-training would allow, while also surpassing previous the state-of-the-art at context window extension. In addition, we demonstrate that YaRN exhibits the capability to extrapolate beyond the limited context of a fine-tuning dataset. We publish the checkpoints of Llama 2 7B/13B fine-tuned using YaRN with 64k and 128k context windows at https://github.com/jquesnelle/yarn

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/21/2023

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

We present LongLoRA, an efficient fine-tuning approach that extends the ...
research
09/19/2023

CFGPT: Chinese Financial Assistant with Large Language Model

Large language models (LLMs) have demonstrated great potential in natura...
research
12/21/2022

Parallel Context Windows Improve In-Context Learning of Large Language Models

For applications that require processing large amounts of text at infere...
research
06/27/2023

Extending Context Window of Large Language Models via Positional Interpolation

We present Position Interpolation (PI) that extends the context window s...
research
06/27/2023

HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution

Genomic (DNA) sequences encode an enormous amount of information for gen...
research
07/06/2023

Focused Transformer: Contrastive Training for Context Scaling

Large language models have an exceptional capability to incorporate new ...
research
08/24/2022

Induced Natural Language Rationales and Interleaved Markup Tokens Enable Extrapolation in Large Language Models

The ability to extrapolate, i.e., to make predictions on sequences that ...

Please sign up or login with your details

Forgot password? Click here to reset