Tanimoto Random Features for Scalable Molecular Machine Learning

06/26/2023
by   Austin Tripp, et al.
0

The Tanimoto coefficient is commonly used to measure the similarity between molecules represented as discrete fingerprints, either as a distance metric or a positive definite kernel. While many kernel methods can be accelerated using random feature approximations, at present there is a lack of such approximations for the Tanimoto kernel. In this paper we propose two kinds of novel random features to allow this kernel to scale to large datasets, and in the process discover a novel extension of the kernel to real vectors. We theoretically characterize these random features, and provide error bounds on the spectral norm of the Gram matrix. Experimentally, we show that the random features proposed in this work are effective at approximating the Tanimoto coefficient in real-world datasets and that the kernels explored in this work are useful for molecular property prediction and optimization tasks.

READ FULL TEXT
research
06/29/2015

Bayesian Nonparametric Kernel-Learning

Kernel methods are ubiquitous tools in machine learning. They have prove...
research
09/24/2015

Linear-time Learning on Distributions with Approximate Kernel Embeddings

Many interesting machine learning problems are best posed by considering...
research
11/11/2022

RFFNet: Scalable and interpretable kernel methods via Random Fourier Features

Kernel methods provide a flexible and theoretically grounded approach to...
research
01/21/2022

Improved Random Features for Dot Product Kernels

Dot product kernels, such as polynomial and exponential (softmax) kernel...
research
04/12/2022

Local Random Feature Approximations of the Gaussian Kernel

A fundamental drawback of kernel-based statistical models is their limit...
research
02/14/2018

D2KE: From Distance to Kernel and Embedding

For many machine learning problem settings, particularly with structured...
research
12/07/2017

Learning Random Fourier Features by Hybrid Constrained Optimization

The kernel embedding algorithm is an important component for adapting ke...

Please sign up or login with your details

Forgot password? Click here to reset