Distributed Real-Time Sentiment Analysis for Big Data Social Streams

Big data trend has enforced the data-centric systems to have continuous fast data streams. In recent years, real-time analytics on stream data has formed into a new research field, which aims to answer queries about what-is-happening-now with a negligible delay. The real challenge with real-time stream data processing is that it is impossible to store instances of data, and therefore online analytical algorithms are utilized. To perform real-time analytics, pre-processing of data should be performed in a way that only a short summary of stream is stored in main memory. In addition, due to high speed of arrival, average processing time for each instance of data should be in such a way that incoming instances are not lost without being captured. Lastly, the learner needs to provide high analytical accuracy measures. Sentinel is a distributed system written in Java that aims to solve this challenge by enforcing both the processing and learning process to be done in distributed form. Sentinel is built on top of Apache Storm, a distributed computing platform. Sentinels learner, Vertical Hoeffding Tree, is a parallel decision tree-learning algorithm based on the VFDT, with ability of enabling parallel classification in distributed environments. Sentinel also uses SpaceSaving to keep a summary of the data stream and stores its summary in a synopsis data structure. Application of Sentinel on Twitter Public Stream API is shown and the results are discussed.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/23/2021

Real-time intelligent big data processing: technology, platform, and applications

Human beings keep exploring the physical space using information means. ...
research
01/11/2018

Polypus: a Big Data Self-Deployable Architecture for Microblogging Text Extraction and Real-Time Sentiment Analysis

In this paper we propose a new parallel architecture based on Big Data t...
research
12/04/2018

Unleashing the Power of Hashtags in Tweet Analytics with Distributed Framework on Apache Storm

Twitter is a popular social network platform where users can interact an...
research
10/17/2019

Adaptive Normalization in Streaming Data

In todays digital era, data are everywhere from Internet of Things to he...
research
01/25/2019

A quality model for evaluating and choosing a stream processing framework architecture

Today, we have to deal with many data (Big data) and we need to make dec...
research
03/21/2020

A Synopses Data Engine for Interactive Extreme-Scale Analytics

In this work, we detail the design and structure of a Synopses Data Engi...
research
05/16/2018

Strict Very Fast Decision Tree: a memory conservative algorithm for data stream mining

Dealing with memory and time constraints are current challenges when lea...

Please sign up or login with your details

Forgot password? Click here to reset