Smart Crawling: A New Approach toward Focus Crawling from Twitter

10/08/2021
by   Ahmad Khazaie, et al.
0

Twitter is a social network that offers a rich and interesting source of information challenging to retrieve and analyze. Twitter data can be accessed using a REST API. The available operations allow retrieving tweets on the basis of a set of keywords but with limitations such as the number of calls per minute and the size of results. Besides, there is no control on retrieved results and finding tweets which are relevant to a specific topic is a big issue. Given these limitations, it is important that the query keywords cover unambiguously the topic of interest in order to both reach the relevant answers and decrease the number of API calls. In this paper, we introduce a new crawling algorithm called "SmartTwitter Crawling" (STiC) that retrieves a set of tweets related to a target topic. In this algorithm, we take an initial keyword query and enrich it using a set of additional keywords that come from different data sources. STiC algorithm relies on a DFS search in Twittergraph where each reached tweet is considered if it is relevant with the query keywords using a scoring, updated throughout the whole crawling process. This scoring takes into account the tweet text, hashtags and the users who have posted the tweet, replied to the tweet, been mentioned in the tweet or retweeted the tweet. Given this score, STiC is able to select relevant tweets in each iteration and continue by adding the related valuable tweets. Several experiments have been achieved for different kinds of queries, the results showedthat the precision increases compared to a simple BFS search.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/21/2020

Automatic Query Optimization for Retrieving Traffic Tweets

Twitter, like many social media and data brokering companies, makes thei...
research
04/23/2019

Optimizing Search API Queries for Twitter Topic Classifiers Using a Maximum Set Coverage Approach

Twitter has grown to become an important platform to access immediate in...
research
09/22/2022

Active Keyword Selection to Track Evolving Topics on Twitter

How can we study social interactions on evolving topics at a mass scale?...
research
09/03/2017

A Semi-Supervised Approach to Detecting Stance in Tweets

Stance classification aims to identify, for a particular issue under dis...
research
01/23/2020

Whose Tweets are Surveilled for the Police: An Audit of Social-Media Monitoring Tool via Log Files

Social media monitoring by law enforcement is becoming commonplace, but ...
research
01/19/2020

Efficient Radial Pattern Keyword Search on Knowledge Graphs in Parallel

Recently, keyword search on Knowledge Graphs (KGs) becomes popular. Typi...
research
05/03/2022

A Comparison of Approaches for Imbalanced Classification Problems in the Context of Retrieving Relevant Documents for an Analysis

One of the first steps in many text-based social science studies is to r...

Please sign up or login with your details

Forgot password? Click here to reset