The Lipschitz Constant of Self-Attention

06/08/2020
by   Hyunjik Kim, et al.
0

Lipschitz constants of neural networks have been explored in various contexts in deep learning, such as provable adversarial robustness, estimating Wasserstein distance, stabilising training of GANs, and formulating invertible neural networks. Such works have focused on bounding the Lipschitz constant of fully connected or convolutional networks, composed of linear maps and pointwise non-linearities. In this paper, we investigate the Lipschitz constant of self-attention, a non-linear neural network module widely used in sequence modelling. We prove that the standard dot-product self-attention is not Lipschitz, and propose an alternative L2 self-attention that is Lipschitz. We derive an upper bound on the Lipschitz constant of L2 self-attention and provide empirical evidence for its asymptotic tightness. To demonstrate the practical relevance of the theory, we formulate invertible self-attention and use it in a Transformer-based architecture for a character-level language modelling task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/06/2020

A Mathematical Theory of Attention

Attention is a powerful component of modern neural networks across a wid...
research
06/12/2023

Mitigating Transformer Overconfidence via Lipschitz Regularization

Though Transformers have achieved promising results in many computer vis...
research
03/08/2021

Lipschitz Normalization for Self-Attention Layers with Application to Graph Neural Networks

Attention based neural networks are state of the art in a large range of...
research
10/13/2022

Efficiently Computing Local Lipschitz Constants of Neural Networks via Bound Propagation

Lipschitz constants are connected to many properties of neural networks,...
research
06/11/2019

Stable Rank Normalization for Improved Generalization in Neural Networks and GANs

Exciting new work on the generalization bounds for neural networks (NN) ...
research
12/20/2020

LieTransformer: Equivariant self-attention for Lie Groups

Group equivariant neural networks are used as building blocks of group i...
research
02/10/2020

Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks

The Lipschitz constant of a network plays an important role in many appl...

Please sign up or login with your details

Forgot password? Click here to reset