Sub-linear RACE Sketches for Approximate Kernel Density Estimation on Streaming Data

12/04/2019
by   Benjamin Coleman, et al.
0

Kernel density estimation is a simple and effective method that lies at the heart of many important machine learning applications. Unfortunately, kernel methods scale poorly for large, high dimensional datasets. Approximate kernel density estimation has a prohibitively high memory and computation cost, especially in the streaming setting. Recent sampling algorithms for high dimensional densities can reduce the computation cost but cannot operate online, while streaming algorithms cannot handle high dimensional datasets due to the curse of dimensionality. We propose RACE, an efficient sketching algorithm for kernel density estimation on high-dimensional streaming data. RACE compresses a set of N high dimensional vectors into a small array of integer counters. This array is sufficient to estimate the kernel density for a large class of kernels. Our sketch is practical to implement and comes with strong theoretical guarantees. We evaluate our method on real-world high-dimensional datasets and show that our sketch achieves 10x better compression compared to competing methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/16/2020

A One-Pass Private Sketch for Most Machine Learning Tasks

Differential privacy (DP) is a compelling privacy definition that explai...
research
02/18/2019

RACE: Sub-Linear Memory Sketches for Approximate Near-Neighbor Search on Streaming Data

We demonstrate the first possibility of a sub-linear memory sketch for s...
research
09/20/2022

Streaming Encoding Algorithms for Scalable Hyperdimensional Computing

Hyperdimensional computing (HDC) is a paradigm for data representation a...
research
02/08/2022

Class Density and Dataset Quality in High-Dimensional, Unstructured Data

We provide a definition for class density that can be used to measure th...
research
08/23/2021

StreaMRAK a Streaming Multi-Resolution Adaptive Kernel Algorithm

Kernel ridge regression (KRR) is a popular scheme for non-linear non-par...
research
02/08/2023

Disentangling Learning Representations with Density Estimation

Disentangled learning representations have promising utility in many app...
research
12/01/2022

Sub-quadratic Algorithms for Kernel Matrices via Kernel Density Estimation

Kernel matrices, as well as weighted graphs represented by them, are ubi...

Please sign up or login with your details

Forgot password? Click here to reset