StreaMRAK a Streaming Multi-Resolution Adaptive Kernel Algorithm

08/23/2021
by   Andreas Oslandsbotn, et al.
11

Kernel ridge regression (KRR) is a popular scheme for non-linear non-parametric learning. However, existing implementations of KRR require that all the data is stored in the main memory, which severely limits the use of KRR in contexts where data size far exceeds the memory size. Such applications are increasingly common in data mining, bioinformatics, and control. A powerful paradigm for computing on data sets that are too large for memory is the streaming model of computation, where we process one data sample at a time, discarding each sample before moving on to the next one. In this paper, we propose StreaMRAK - a streaming version of KRR. StreaMRAK improves on existing KRR schemes by dividing the problem into several levels of resolution, which allows continual refinement to the predictions. The algorithm reduces the memory requirement by continuously and efficiently integrating new samples into the training model. With a novel sub-sampling scheme, StreaMRAK reduces memory and computational complexities by creating a sketch of the original data, where the sub-sampling density is adapted to the bandwidth of the kernel and the local dimensionality of the data. We present a showcase study on two synthetic problems and the prediction of the trajectory of a double pendulum. The results show that the proposed algorithm is fast and accurate.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/02/2023

Semiparametric Language Models Are Scalable Continual Learners

Semiparametric language models (LMs) have shown promise in continuously ...
research
08/16/2017

Adaptive Threshold Sampling and Estimation

Sampling is a fundamental problem in both computer science and statistic...
research
12/04/2019

Sub-linear RACE Sketches for Approximate Kernel Density Estimation on Streaming Data

Kernel density estimation is a simple and effective method that lies at ...
research
09/20/2021

Learning to Forecast Dynamical Systems from Streaming Data

Kernel analog forecasting (KAF) is a powerful methodology for data-drive...
research
10/29/2016

Diversity Promoting Online Sampling for Streaming Video Summarization

Many applications benefit from sampling algorithms where a small number ...
research
01/26/2012

Dynamic trees for streaming and massive data contexts

Data collection at a massive scale is becoming ubiquitous in a wide vari...
research
01/31/2023

On Memory Codelets: Prefetching, Recoding, Moving and Streaming Data

For decades, memory capabilities have scaled up much slower than compute...

Please sign up or login with your details

Forgot password? Click here to reset