Reliable and Efficient Long-Term Twitter Monitoring
Social media data is now widely used by many academic researchers. However, long-term social media data collection projects, which involve collecting Twitter data from Twitter's public-use APIs, often encounter various issues when they try to collect streaming social media monitoring data from local-area network servers (LANs). In this technical report, we discuss some of the issues that we have encountered in our Twitter data collection project. We present a cloud-based data collection, pre-processing, and archiving infrastructure which we argue mitigates or resolves the problems we have encountered, at minimal cloud-computing costs. We show how this approach works in different cloud computing architectures.
READ FULL TEXT