Data Stream Classification using Random Feature Functions and Novel Method Combinations

11/03/2015
by   Diego Marrón, et al.
0

Big Data streams are being generated in a faster, bigger, and more commonplace. In this scenario, Hoeffding Trees are an established method for classification. Several extensions exist, including high-performing ensemble setups such as online and leveraging bagging. Also, k-nearest neighbors is a popular choice, with most extensions dealing with the inherent performance limitations over a potentially-infinite stream. At the same time, gradient descent methods are becoming increasingly popular, owing in part to the successes of deep learning. Although deep neural networks can learn incrementally, they have so far proved too sensitive to hyper-parameter options and initial conditions to be considered an effective `off-the-shelf' data-streams solution. In this work, we look at combinations of Hoeffding-trees, nearest neighbour, and gradient descent methods with a streaming preprocessing approach in the form of a random feature functions filter for additional predictive power. We further extend the investigation to implementing methods on GPUs, which we test on some large real-world datasets, and show the benefits of using GPUs for data-stream learning due to their high scalability. Our empirical evaluation yields positive results for the novel approaches that we experiment with, highlighting important issues, and shed light on promising future directions in approaches to data-stream classification.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/03/2020

Asynchronous dual-pipeline deep learning framework for online data stream classification

Data streaming classification has become an essential task in many field...
research
10/14/2018

DPASF: A Flink Library for Streaming Data preprocessing

Data preprocessing techniques are devoted to correct or alleviate errors...
research
02/10/2019

Hybrid Forest: A Concept Drift Aware Data Stream Mining Algorithm

Nowadays with a growing number of online controlling systems in the orga...
research
11/18/2022

TensAIR: Online Learning from Data Streams via Asynchronous Iterative Routing

Online learning (OL) from data streams is an emerging area of research t...
research
04/21/2023

Integrating Per-Stream Stat Tracking into Accel-Sim

Accel-Sim is a widely used computer architecture simulator that models t...
research
12/30/2022

Learning from Data Streams: An Overview and Update

The literature on machine learning in the context of data streams is vas...
research
06/14/2021

z-anonymity: Zero-Delay Anonymization for Data Streams

With the advent of big data and the birth of the data markets that sell ...

Please sign up or login with your details

Forgot password? Click here to reset