Unleashing the Power of Hashtags in Tweet Analytics with Distributed Framework on Apache Storm

by   Vibhuti Gupta, et al.

Twitter is a popular social network platform where users can interact and post texts of up to 280 characters called tweets. Hashtags, hyperlinked words in tweets, have increasingly become crucial for tweet retrieval and search. Using hashtags for tweet topic classification is a challenging problem because of context dependent among words, slangs, abbreviation and emoticons in a short tweet along with evolving use of hashtags. Since Twitter generates millions of tweets daily, tweet analytics is a fundamental problem of Big data stream that often requires a real-time Distributed processing. This paper proposes a distributed online approach to tweet topic classification with hashtags. Being implemented on Apache Storm, a distributed real time framework, our approach incrementally identifies and updates a set of strong predictors in the Naïve Bayes model for classifying each incoming tweet instance. Preliminary experiments show promising results with up to 97 throughput on eight processors.



page 1

page 2

page 3

page 4


A Few Topical Tweets are Enough for Effective User-Level Stance Detection

Stance detection entails ascertaining the position of a user towards a t...

Distributed Real-Time Sentiment Analysis for Big Data Social Streams

Big data trend has enforced the data-centric systems to have continuous ...

Discovering Users Topic of Interest from Tweet

Nowadays social media has become one of the largest gatherings of people...

City-level Geolocation of Tweets for Real-time Visual Analytics

Real-time tweets can provide useful information on evolving events and s...

Smart Crawling: A New Approach toward Focus Crawling from Twitter

Twitter is a social network that offers a rich and interesting source of...

Adaptive Normalization in Streaming Data

In todays digital era, data are everywhere from Internet of Things to he...

Everything You Always Wanted to Know About TREC RTS* (*But Were Afraid to Ask)

The TREC Real-Time Summarization (RTS) track provides a framework for ev...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.