Linear-Time Self Attention with Codeword Histogram for Efficient Recommendation

05/28/2021
by   Yongji Wu, et al.
0

Self-attention has become increasingly popular in a variety of sequence modeling tasks from natural language processing to recommendation, due to its effectiveness. However, self-attention suffers from quadratic computational and memory complexities, prohibiting its applications on long sequences. Existing approaches that address this issue mainly rely on a sparse attention context, either using a local window, or a permuted bucket obtained by locality-sensitive hashing (LSH) or sorting, while crucial information may be lost. Inspired by the idea of vector quantization that uses cluster centroids to approximate items, we propose LISA (LInear-time Self Attention), which enjoys both the effectiveness of vanilla self-attention and the efficiency of sparse attention. LISA scales linearly with the sequence length, while enabling full contextual attention via computing differentiable histograms of codeword distributions. Meanwhile, unlike some efficient attention methods, our method poses no restriction on casual masking or sequence length. We evaluate our method on four real-world datasets for sequential recommendation. The results show that LISA outperforms the state-of-the-art efficient attention methods in both performance and speed; and it is up to 57x faster and 78x more memory efficient than vanilla self-attention.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/18/2021

You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling

Transformer-based models are widely used in natural language processing ...
research
03/12/2020

Efficient Content-Based Sparse Attention with Routing Transformers

Self-attention has recently been adopted for a wide range of sequence mo...
research
02/26/2020

Sparse Sinkhorn Attention

We propose Sparse Sinkhorn Attention, a new efficient and sparse method ...
research
05/25/2022

Leveraging Locality in Abstractive Text Summarization

Despite the successes of neural attention models for natural language ge...
research
03/02/2022

DCT-Former: Efficient Self-Attention with Discrete Cosine Transform

Since their introduction the Trasformer architectures emerged as the dom...
research
10/11/2020

SMYRF: Efficient Attention using Asymmetric Clustering

We propose a novel type of balanced clustering algorithm to approximate ...
research
12/04/2018

Factorized Attention: Self-Attention with Linear Complexities

Recent works have been applying self-attention to various fields in comp...

Please sign up or login with your details

Forgot password? Click here to reset