Distribution-Free One-Pass Learning

06/08/2017
by   Peng Zhao, et al.
0

In many large-scale machine learning applications, data are accumulated with time, and thus, an appropriate model should be able to update in an online paradigm. Moreover, as the whole data volume is unknown when constructing the model, it is desired to scan each data item only once with a storage independent with the data volume. It is also noteworthy that the distribution underlying may change during the data accumulation procedure. To handle such tasks, in this paper we propose DFOP, a distribution-free one-pass learning approach. This approach works well when distribution change occurs during data accumulation, without requiring prior knowledge about the change. Every data item can be discarded once it has been scanned. Besides, theoretical guarantee shows that the estimate error, under a mild assumption, decreases until convergence with high probability. The performance of DFOP for both regression and classification are validated in experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/10/2020

An Asymptotically Optimal Algorithm for Online Stacking

Consider a storage area where arriving items are stored temporarily in b...
research
11/27/2018

Beta Distribution Drift Detection for Adaptive Classifiers

With today's abundant streams of data, the only constant we can rely on ...
research
05/03/2023

Stackelberg Attacks on Auctions and Blockchain Transaction Fee Mechanisms

We study an auction with m identical items in a context where n agents c...
research
12/05/2018

Calibrate: Frequency Estimation and Heavy Hitter Identification with Local Differential Privacy via Incorporating Prior Knowledge

Estimating frequencies of certain items among a population is a basic st...
research
01/31/2022

L-SVRG and L-Katyusha with Adaptive Sampling

Stochastic gradient-based optimization methods, such as L-SVRG and its a...
research
03/19/2022

When regression coefficients change over time: A proposal

A common approach in forecasting problems is to estimate a least-squares...
research
11/16/2021

Highly Efficient Indexing Scheme for k-Dominant Skyline Processing over Uncertain Data Streams

Skyline is widely used in reality to solve multi-criteria problems, such...

Please sign up or login with your details

Forgot password? Click here to reset