On the Regularity of Attention

02/10/2021
by   James Vuckovic, et al.
0

Attention is a powerful component of modern neural networks across a wide variety of domains. In this paper, we seek to quantify the regularity (i.e. the amount of smoothness) of the attention operation. To accomplish this goal, we propose a new mathematical framework that uses measure theory and integral operators to model attention. We show that this framework is consistent with the usual definition, and that it captures the essential properties of attention. Then we use this framework to prove that, on compact domains, the attention operation is Lipschitz continuous and provide an estimate of its Lipschitz constant. Additionally, by focusing on a specific type of attention, we extend these Lipschitz continuity results to non-compact domains. We also discuss the effects regularity can have on NLP models, and applications to invertible and infinitely-deep networks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/06/2021

Besov regularity for the Dirichlet integral fractional Laplacian in Lipschitz domains

We prove Besov regularity estimates for the solution of the Dirichlet pr...
research
07/06/2020

A Mathematical Theory of Attention

Attention is a powerful component of modern neural networks across a wid...
research
10/23/2021

Coarse-Grained Smoothness for RL in Metric Spaces

Principled decision-making in continuous state–action spaces is impossib...
research
03/25/2021

About the regularity of the discriminator in conditional WGANs

Training of conditional WGANs is usually done by averaging the underlyin...
research
05/24/2021

Coercivity, essential norms, and the Galerkin method for second-kind integral equations on polyhedral and Lipschitz domains

It is well known that, with a particular choice of norm, the classical d...
research
05/28/2018

Lipschitz regularity of deep neural networks: analysis and efficient estimation

Deep neural networks are notorious for being sensitive to small well-cho...
research
06/16/2021

Invertible Attention

Attention has been proved to be an efficient mechanism to capture long-r...

Please sign up or login with your details

Forgot password? Click here to reset