Improved Random Features for Dot Product Kernels

01/21/2022
by   Jonas Wacker, et al.
0

Dot product kernels, such as polynomial and exponential (softmax) kernels, are among the most widely used kernels in machine learning, as they enable modeling the interactions between input features, which is crucial in applications like computer vision, natural language processing, and recommender systems. We make several novel contributions for improving the efficiency of random feature approximations for dot product kernels, to make these kernels more useful in large scale learning. First, we present a generalization of existing random feature approximations for polynomial kernels, such as Rademacher and Gaussian sketches and TensorSRHT, using complex-valued random features. We show empirically that the use of complex features can significantly reduce the variances of these approximations. Second, we provide a theoretical analysis for understanding the factors affecting the efficiency of various random feature approximations, by deriving closed-form expressions for their variances. These variance formulas elucidate conditions under which certain approximations (e.g., TensorSRHT) achieve lower variances than others (e.g., Rademacher sketches), and conditions under which the use of complex features leads to lower variances than real features. Third, by using these variance formulas, which can be evaluated in practice, we develop a data-driven optimization approach to improve random feature approximations for general dot product kernels, which is also applicable to the Gaussian kernel. We describe the improvements brought by these contributions with extensive experiments on a variety of tasks and datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/04/2022

Complex-to-Real Random Features for Polynomial Kernels

Kernel methods are ubiquitous in statistical modeling due to their theor...
research
02/01/2023

FAVOR#: Sharp Attention Kernel Approximations via New Classes of Positive Random Features

The problem of efficient approximation of a linear operator induced by t...
research
06/26/2023

Tanimoto Random Features for Scalable Molecular Machine Learning

The Tanimoto coefficient is commonly used to measure the similarity betw...
research
02/09/2021

Graph-Aided Online Multi-Kernel Learning

Multi-kernel learning (MKL) has been widely used in function approximati...
research
02/07/2022

Random Gegenbauer Features for Scalable Kernel Methods

We propose efficient random features for approximating a new and rich cl...
research
04/12/2022

Local Random Feature Approximations of the Gaussian Kernel

A fundamental drawback of kernel-based statistical models is their limit...
research
02/09/2020

Learning High Order Feature Interactions with Fine Control Kernels

We provide a methodology for learning sparse statistical models that use...

Please sign up or login with your details

Forgot password? Click here to reset