Lexinvariant Language Models

05/24/2023
by   Qian Huang, et al.
0

Token embeddings, a mapping from discrete lexical symbols to continuous vectors, are at the heart of any language model (LM). However, lexical symbol meanings can also be determined and even redefined by their structural role in a long context. In this paper, we ask: is it possible for a language model to be performant without any fixed token embeddings? Such a language model would have to rely entirely on the co-occurence and repetition of tokens in the context rather than the a priori identity of any token. To answer this, we study lexinvariantlanguage models that are invariant to lexical symbols and therefore do not need fixed token embeddings in practice. First, we prove that we can construct a lexinvariant LM to converge to the true language model at a uniform rate that is polynomial in terms of the context length, with a constant factor that is sublinear in the vocabulary size. Second, to build a lexinvariant LM, we simply encode tokens using random Gaussian vectors, such that each token maps to the same representation within each sequence but different representations across sequences. Empirically, we demonstrate that it can indeed attain perplexity comparable to that of a standard language model, given a sufficiently long context. We further explore two properties of the lexinvariant language models: First, given text generated from a substitution cipher of English, it implicitly implements Bayesian in-context deciphering and infers the mapping to the underlying real tokens with high accuracy. Second, it has on average 4X better accuracy over synthetic in-context reasoning tasks. Finally, we discuss regularizing standard language models towards lexinvariance and potential practical applications.

READ FULL TEXT

page 6

page 7

page 22

research
08/07/2023

RecycleGPT: An Autoregressive Language Model with Recyclable Module

Existing large language models have to run K times to generate a sequenc...
research
06/06/2023

LLMZip: Lossless Text Compression using Large Language Models

We provide new estimates of an asymptotic upper bound on the entropy of ...
research
08/11/2021

A Transformer-based Math Language Model for Handwritten Math Expression Recognition

Handwritten mathematical expressions (HMEs) contain ambiguities in their...
research
12/21/2022

Reconstruction Probing

We propose reconstruction probing, a new analysis method for contextuali...
research
09/29/2020

Improving Low Compute Language Modeling with In-Domain Embedding Initialisation

Many NLP applications, such as biomedical data and technical support, ha...
research
04/09/2021

Lookup-Table Recurrent Language Models for Long Tail Speech Recognition

We introduce Lookup-Table Language Models (LookupLM), a method for scali...
research
12/30/2022

Black-box language model explanation by context length probing

The increasingly widespread adoption of large language models has highli...

Please sign up or login with your details

Forgot password? Click here to reset