Attention Approximates Sparse Distributed Memory

11/10/2021
by   Trenton Bricken, et al.
0

While Attention has come to be an important mechanism in deep learning, there remains limited intuition for why it works so well. Here, we show that Transformer Attention can be closely related under certain data conditions to Kanerva's Sparse Distributed Memory (SDM), a biologically plausible associative memory model. We confirm that these conditions are satisfied in pre-trained GPT2 Transformer models. We discuss the implications of the Attention-SDM map and provide new computational and biological interpretations of Attention.

READ FULL TEXT
research
12/10/2021

Couplformer:Rethinking Vision Transformer with Coupling Attention Map

With the development of the self-attention mechanism, the Transformer mo...
research
09/29/2022

Spikformer: When Spiking Neural Network Meets Transformer

We consider two biologically plausible structures, the Spiking Neural Ne...
research
01/26/2022

Neural Grapheme-to-Phoneme Conversion with Pre-trained Grapheme Models

Neural network models have achieved state-of-the-art performance on grap...
research
05/27/2022

Understanding Long Programming Languages with Structure-Aware Sparse Attention

Programming-based Pre-trained Language Models (PPLMs) such as CodeBERT h...
research
02/22/2022

Socialformer: Social Network Inspired Long Document Modeling for Document Ranking

Utilizing pre-trained language models has achieved great success for neu...
research
09/02/2019

Logic and the 2-Simplicial Transformer

We introduce the 2-simplicial Transformer, an extension of the Transform...
research
05/31/2021

A remark on a paper of Krotov and Hopfield [arXiv:2008.06996]

In their recent paper titled "Large Associative Memory Problem in Neurob...

Please sign up or login with your details

Forgot password? Click here to reset