Is Attention Better Than Matrix Decomposition?

09/09/2021
by   Zhengyang Geng, et al.
47

As an essential ingredient of modern deep learning, attention mechanism, especially self-attention, plays a vital role in the global correlation discovery. However, is hand-crafted attention irreplaceable when modeling the global context? Our intriguing finding is that self-attention is not better than the matrix decomposition (MD) model developed 20 years ago regarding the performance and computational cost for encoding the long-distance dependencies. We model the global context issue as a low-rank recovery problem and show that its optimization algorithms can help design global information blocks. This paper then proposes a series of Hamburgers, in which we employ the optimization algorithms for solving MDs to factorize the input representations into sub-matrices and reconstruct a low-rank embedding. Hamburgers with different MDs can perform favorably against the popular global context module self-attention when carefully coping with gradients back-propagated through MDs. Comprehensive experiments are conducted in the vision tasks where it is crucial to learn the global context, including semantic segmentation and image generation, demonstrating significant improvements over self-attention and its variants.

READ FULL TEXT

page 7

page 21

research
04/22/2022

Paramixer: Parameterizing Mixing Links in Sparse Factors Works Better than Dot-Product Self-Attention

Self-Attention is a widely used building block in neural modeling to mix...
research
10/10/2020

HCNet: Hierarchical Context Network for Semantic Segmentation

Global context information is vital in visual understanding problems, es...
research
02/15/2019

Context-Aware Self-Attention Networks

Self-attention model have shown its flexibility in parallel computation ...
research
08/02/2020

Tensor Low-Rank Reconstruction for Semantic Segmentation

Context information plays an indispensable role in the success of semant...
research
07/29/2019

Interlaced Sparse Self-Attention for Semantic Segmentation

In this paper, we present a so-called interlaced sparse self-attention a...
research
05/31/2023

Primal-Attention: Self-attention through Asymmetric Kernel SVD in Primal Representation

Recently, a new line of works has emerged to understand and improve self...
research
01/08/2023

Large-scale Global Low-rank Optimization for Computational Compressed Imaging

Computational reconstruction plays a vital role in computer vision and c...

Please sign up or login with your details

Forgot password? Click here to reset