Multiscale Self Attentive Convolutions for Vision and Language Modeling

12/03/2019
by   Oren Barkan, et al.
0

Self attention mechanisms have become a key building block in many state-of-the-art language understanding models. In this paper, we show that the self attention operator can be formulated in terms of 1x1 convolution operations. Following this observation, we propose several novel operators: First, we introduce a 2D version of self attention that is applicable for 2D signals such as images. Second, we present the 1D and 2D Self Attentive Convolutions (SAC) operator that generalizes self attention beyond 1x1 convolutions to 1xm and nxm convolutions, respectively. While 1D and 2D self attention operate on individual words and pixels, SAC operates on m-grams and image patches, respectively. Third, we present a multiscale version of SAC (MSAC) which analyzes the input by employing multiple SAC operators that vary by filter size, in parallel. Finally, we explain how MSAC can be utilized for vision and language modeling, and further harness MSAC to form a cross attentive image similarity machinery.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/29/2019

Pay Less Attention with Lightweight and Dynamic Convolutions

Self-attention is a useful mechanism to build generative models for lang...
research
04/18/2020

Adaptive Attention Span in Computer Vision

Recent developments in Transformers for language modeling have opened ne...
research
06/13/2019

Stand-Alone Self-Attention in Vision Models

Convolutions are a fundamental building block of modern computer vision ...
research
11/15/2021

Searching for TrioNet: Combining Convolution with Local and Global Self-Attention

Recently, self-attention operators have shown superior performance as a ...
research
09/24/2021

Attentive Contractive Flow: Improved Contractive Flows with Lipschitz-constrained Self-Attention

Normalizing flows provide an elegant method for obtaining tractable dens...
research
02/08/2020

Time-aware Large Kernel Convolutions

To date, most state-of-the-art sequence modelling architectures use atte...
research
07/12/2021

Locally Enhanced Self-Attention: Rethinking Self-Attention as Local and Context Terms

Self-Attention has become prevalent in computer vision models. Inspired ...

Please sign up or login with your details

Forgot password? Click here to reset