twAwler: A lightweight twitter crawler

04/20/2018
by   Polyvios Pratikakis, et al.
0

This paper presents twAwler, a lightweight twitter crawler that targets language-specific communities of users. twAwler takes advantage of multiple endpoints of the twitter API to explore user relations and quickly recognize users belonging to the targetted set. It performs a complete crawl for all users, discovering many standard user relations, including the retweet graph, mention graph, reply graph, quote graph, follow graph, etc. twAwler respects all twitter policies and rate limits, while able to monitor large communities of active users. twAwler was used between August 2016 and March 2018 to generate an extensive dataset of close to all Greek-speaking twitter accounts (about 330 thousand) and their tweets and relations. In total, the crawler has gathered 750 million tweets of which 424 million are in Greek; 750 million follow relations; information about 300 thousand lists, their members (119 million member relations) and subscribers (27 thousand subscription relations); 705 thousand trending topics; information on 52 million users in total of which 292 thousand have been since suspended, 141 thousand have deleted their account, and 3.5 million are protected and cannot be crawled. twAwler mines the collected tweets for the retweet, quote, reply, and mention graphs, which, in addition to the follow relation crawled, offer vast opportunities for analysis and further research.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset