Variation across Scales: Measurement Fidelity under Twitter Data Sampling

03/21/2020
by   Siqi Wu, et al.
0

A comprehensive understanding of data bias is the cornerstone of mitigating biases in social media research. This paper presents in-depth measurements of the effects of Twitter data sampling across different timescales and different subjects (entities, networks, and cascades). By constructing two complete tweet streams, we show that Twitter rate limit message is an accurate measure for the volume of missing tweets. Despite sampling rates having clear temporal variations, we find that the Bernoulli process with a uniform rate well approximates Twitter data sampling, and it allows to estimate the ground-truth entity frequency and ranking with the observed sample data. In terms of network analysis, we observe significant structure changes in both the user-hashtag bipartite graph and the retweet network. Finally, we measure the retweet cascades. We identify risks for information diffusion models that rely on tweet inter-arrival times and user influence. This work calls attention to the social data bias caused by data collection, and proposes methods to measure the systematic biases introduced by sampling.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/27/2017

Scaling laws in geo-located Twitter data

We observe and report on a systematic relationship between population de...
research
04/07/2021

Analysis of Twitter Users' Lifestyle Choices using Joint Embedding Model

Multiview representation learning of data can help construct coherent an...
research
07/19/2021

Predicting the 2020 US Presidential Election with Twitter

One major sub-domain in the subject of polling public opinion with socia...
research
06/20/2018

Explaining Controversy on Social Media via Stance Summarization

In an era in which new controversies rapidly emerge and evolve on social...
research
12/12/2022

LAMBRETTA: Learning to Rank for Twitter Soft Moderation

To curb the problem of false information, social media platforms like Tw...
research
10/17/2019

Keyphrase Extraction from Disaster-related Tweets

While keyphrase extraction has received considerable attention in recent...
research
01/20/2016

The DARPA Twitter Bot Challenge

A number of organizations ranging from terrorist groups such as ISIS to ...

Please sign up or login with your details

Forgot password? Click here to reset