FAVOR#: Sharp Attention Kernel Approximations via New Classes of Positive Random Features

02/01/2023
by   Valerii Likhosherstov, et al.
0

The problem of efficient approximation of a linear operator induced by the Gaussian or softmax kernel is often addressed using random features (RFs) which yield an unbiased approximation of the operator's result. Such operators emerge in important applications ranging from kernel methods to efficient Transformers. We propose parameterized, positive, non-trigonometric RFs which approximate Gaussian and softmax-kernels. In contrast to traditional RF approximations, parameters of these new methods can be optimized to reduce the variance of the approximation, and the optimum can be expressed in closed form. We show that our methods lead to variance reduction in practice (e^10-times smaller variance and beyond) and outperform previous methods in a kernel regression task. Using our proposed mechanism, we also present FAVOR#, a method for self-attention approximation in Transformers. We show that FAVOR# outperforms other random feature methods in speech modelling and natural language processing.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/30/2022

Chefs' Random Tables: Non-Trigonometric Random Features

We introduce chefs' random tables (CRTs), a new class of non-trigonometr...
research
10/08/2021

Hybrid Random Features

We propose a new class of random feature methods for linearizing softmax...
research
01/21/2022

Improved Random Features for Dot Product Kernels

Dot product kernels, such as polynomial and exponential (softmax) kernel...
research
01/31/2023

Simplex Random Features

We present Simplex Random Features (SimRFs), a new random feature (RF) m...
research
04/13/2021

Towards Unbiased Random Features with Lower Variance For Stationary Indefinite Kernels

Random Fourier Features (RFF) demonstrate wellappreciated performance in...
research
03/17/2021

Value-aware Approximate Attention

Following the success of dot-product attention in Transformers, numerous...
research
06/02/2020

Kernel-independent adaptive construction of ℋ^2-matrix approximations

A method for the kernel-independent construction of ℋ^2-matrix approxima...

Please sign up or login with your details

Forgot password? Click here to reset