MEMORY-VQ: Compression for Tractable Internet-Scale Memory

08/28/2023
by   Yury Zemlyanskiy, et al.
0

Retrieval augmentation is a powerful but expensive method to make language models more knowledgeable about the world. Memory-based methods like LUMEN pre-compute token representations for retrieved passages to drastically speed up inference. However, memory also leads to much greater storage requirements from storing pre-computed representations. We propose MEMORY-VQ, a new method to reduce storage requirements of memory-augmented models without sacrificing performance. Our method uses a vector quantization variational autoencoder (VQ-VAE) to compress token representations. We apply MEMORY-VQ to the LUMEN model to obtain LUMEN-VQ, a memory model that achieves a 16x compression rate with comparable performance on the KILT benchmark. LUMEN-VQ enables practical retrieval augmentation even for extremely large retrieval corpora.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/17/2023

GLIMMER: generalized late-interaction memory reranker

Memory-augmentation is a powerful approach for efficiently incorporating...
research
01/25/2023

Pre-computed memory or on-the-fly encoding? A hybrid approach to retrieval augmentation makes the most of your compute

Retrieval-augmented language models such as Fusion-in-Decoder are powerf...
research
02/15/2017

Frustratingly Short Attention Spans in Neural Language Modeling

Neural language models predict the next token using a latent representat...
research
05/25/2023

Surface-Based Retrieval Reduces Perplexity of Retrieval-Augmented Language Models

Augmenting language models with a retrieval mechanism has been shown to ...
research
03/29/2022

Compact Token Representations with Contextual Quantization for Efficient Document Re-ranking

Transformer based re-ranking models can achieve high search relevance th...
research
10/03/2021

SDR: Efficient Neural Re-ranking using Succinct Document Representation

BERT based ranking models have achieved superior performance on various ...
research
05/13/2022

Slimmable Video Codec

Neural video compression has emerged as a novel paradigm combining train...

Please sign up or login with your details

Forgot password? Click here to reset