Distributed Online Big Data Classification Using Context Information

07/02/2013
by   Cem Tekin, et al.
0

Distributed, online data mining systems have emerged as a result of applications requiring analysis of large amounts of correlated and high-dimensional data produced by multiple distributed data sources. We propose a distributed online data classification framework where data is gathered by distributed data sources and processed by a heterogeneous set of distributed learners which learn online, at run-time, how to classify the different data streams either by using their locally available classification functions or by helping each other by classifying each other's data. Importantly, since the data is gathered at different locations, sending the data to another learner to process incurs additional costs such as delays, and hence this will be only beneficial if the benefits obtained from a better classification will exceed the costs. We model the problem of joint classification by the distributed and heterogeneous learners from multiple data sources as a distributed contextual bandit problem where each data is characterized by a specific context. We develop a distributed online learning algorithm for which we can prove sublinear regret. Compared to prior work in distributed online data mining, our work is the first to provide analytic regret results characterizing the performance of the proposed algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/21/2013

Distributed Online Learning via Cooperative Contextual Bandits

In this paper we propose a novel framework for decentralized, online lea...
research
12/23/2015

Adaptive Ensemble Learning with Confidence Bounds

Extracting actionable intelligence from distributed, heterogeneous, corr...
research
10/05/2021

Data Validation for Big Live Data

Data Integration of heterogeneous data sources relies either on periodic...
research
11/30/2020

Joint integrative analysis of multiple data sources with correlated vector outcomes

We propose a distributed quadratic inference function framework to joint...
research
10/14/2020

Adaptive Deep Forest for Online Learning from Drifting Data Streams

Learning from data streams is among the most vital fields of contemporar...
research
02/15/2022

Survey of Big Data sizes in 2021

The modern increase in data production is driven by multiple factors, an...
research
07/26/2018

EBIC: an open source software for high-dimensional and big data biclustering analyses

Motivation: In this paper we present the latest release of EBIC, a next-...

Please sign up or login with your details

Forgot password? Click here to reset