Chefs' Random Tables: Non-Trigonometric Random Features

05/30/2022
by   Valerii Likhosherstov, et al.
0

We introduce chefs' random tables (CRTs), a new class of non-trigonometric random features (RFs) to approximate Gaussian and softmax kernels. CRTs are an alternative to standard random kitchen sink (RKS) methods, which inherently rely on the trigonometric maps. We present variants of CRTs where RFs are positive, a key requirement for applications in recent low-rank Transformers. Further variance reduction is possible by leveraging statistics which are simple to compute. One instantiation of CRTs, the optimal positive random features (OPRFs), is to our knowledge the first RF method for unbiased softmax kernel estimation with positive and bounded RFs, resulting in exponentially small tails and much lower variance than its counterparts. As we show, orthogonal random features applied in OPRFs provide additional variance reduction for any dimensionality d (not only asymptotically for sufficiently large d, as for RKS). We test CRTs on many tasks ranging from non-parametric classification to training Transformers for text, speech and image data, obtaining new state-of-the-art results for low-rank text Transformers, while providing linear space and time complexity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/01/2023

FAVOR#: Sharp Attention Kernel Approximations via New Classes of Positive Random Features

The problem of efficient approximation of a linear operator induced by t...
research
09/30/2020

Rethinking Attention with Performers

We introduce Performers, Transformer architectures which can estimate re...
research
01/31/2023

Simplex Random Features

We present Simplex Random Features (SimRFs), a new random feature (RF) m...
research
10/08/2021

Hybrid Random Features

We propose a new class of random feature methods for linearizing softmax...
research
04/13/2021

Towards Unbiased Random Features with Lower Variance For Stationary Indefinite Kernels

Random Fourier Features (RFF) demonstrate wellappreciated performance in...
research
06/01/2022

Transformer with Fourier Integral Attentions

Multi-head attention empowers the recent success of transformers, the st...
research
07/14/2023

Low Rank Properties for Estimating Microphones Start Time and Sources Emission Time

Uncertainty in timing information pertaining to the start time of microp...

Please sign up or login with your details

Forgot password? Click here to reset