Kronecker Attention Networks

07/16/2020
by   Hongyang Gao, et al.
54

Attention operators have been applied on both 1-D data like texts and higher-order data such as images and videos. Use of attention operators on high-order data requires flattening of the spatial or spatial-temporal dimensions into a vector, which is assumed to follow a multivariate normal distribution. This not only incurs excessive requirements on computational resources, but also fails to preserve structures in data. In this work, we propose to avoid flattening by assuming the data follow matrix-variate normal distributions. Based on this new view, we develop Kronecker attention operators (KAOs) that operate on high-order tensor data directly. More importantly, the proposed KAOs lead to dramatic reductions in computational resources. Experimental results show that our methods reduce the amount of required computational resources by a factor of hundreds, with larger factors for higher-dimensional and higher-order data. Results also show that networks with KAOs outperform models without attention, while achieving competitive performance as those with original attention operators.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/05/2019

Graph Representation Learning via Hard and Channel-Wise Attention Networks

Attention operators have been widely applied in various fields, includin...
research
02/28/2021

A recursive system-free single-step temporal discretization method for finite difference methods

Single-stage or single-step high-order temporal discretizations of parti...
research
10/15/2014

High Order Structure Descriptors for Scene Images

Structure information is ubiquitous in natural scene images and it plays...
research
03/21/2018

Hiding higher order cross-correlations of multivariate data using Archimedean copulas

In this paper we present the algorithm that changes the subset of margin...
research
02/14/2020

High-Performance High-Order Stencil Computation on FPGAs Using OpenCL

In this paper we evaluate the performance of FPGAs for high-order stenci...
research
08/16/2019

Mixed High-Order Attention Network for Person Re-Identification

Attention has become more attractive in person reidentification (ReID) a...
research
02/02/2023

Deep neural operators can serve as accurate surrogates for shape optimization: A case study for airfoils

Deep neural operators, such as DeepONets, have changed the paradigm in h...

Please sign up or login with your details

Forgot password? Click here to reset