Adaptive Normalization in Streaming Data

10/17/2019
by   Vibhuti Gupta, et al.
0

In todays digital era, data are everywhere from Internet of Things to health care or financial applications. This leads to potentially unbounded ever-growing Big data streams and it needs to be utilized effectively. Data normalization is an important preprocessing technique for data analytics. It helps prevent mismodeling and reduce the complexity inherent in the data especially for data integrated from multiple sources and contexts. Normalization of Big Data stream is challenging because of evolving inconsistencies, time and memory constraints, and non-availability of whole data beforehand. This paper proposes a distributed approach to adaptive normalization for Big data stream. Using sliding windows of fixed size, it provides a simple mechanism to adapt the statistics for normalizing changing data in each window. Implemented on Apache Storm, a distributed real-time stream data framework, our approach exploits distributed data processing for efficient normalization. Unlike other existing adaptive approaches that normalize data for a specific use (e.g., classification), ours does not. Moreover, our adaptive mechanism allows flexible controls, via user-specified thresholds, for normalization tradeoffs between time and precision. The paper illustrates our proposed approach along with a few other techniques and experiments on both synthesized and real-world data. The normalized data obtained from our proposed approach, on 160,000 instances of data stream, improves over the baseline by 89 with the actual data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/14/2018

DPASF: A Flink Library for Streaming Data preprocessing

Data preprocessing techniques are devoted to correct or alleviate errors...
research
06/30/2023

Distance Functions and Normalization Under Stream Scenarios

Data normalization is an essential task when modeling a classification s...
research
12/27/2016

Distributed Real-Time Sentiment Analysis for Big Data Social Streams

Big data trend has enforced the data-centric systems to have continuous ...
research
08/07/2021

Building Analytics Pipelines for Querying Big Streams and Data Histories with H-STREAM

This paper introduces H-STREAM, a big stream/data processing pipelines e...
research
05/16/2017

Strider: A Hybrid Adaptive Distributed RDF Stream Processing Engine

Real-time processing of data streams emanating from sensors is becoming ...
research
12/04/2018

Unleashing the Power of Hashtags in Tweet Analytics with Distributed Framework on Apache Storm

Twitter is a popular social network platform where users can interact an...
research
12/05/2017

An Online Algorithm for Nonparametric Correlations

Nonparametric correlations such as Spearman's rank correlation and Kenda...

Please sign up or login with your details

Forgot password? Click here to reset