The Remarkable Benefit of User-Level Aggregation for Lexical-based Population-Level Predictions

08/29/2018
by   Salvatore Giorgi, et al.
0

Nowcasting based on social media text promises to provide unobtrusive and near real-time predictions of community-level outcomes. These outcomes are typically regarding people, but the data is often aggregated without regard to users in the Twitter populations of each community. This paper describes a simple yet effective method for building community-level models using Twitter language aggregated by user. Results on four different U.S. county-level tasks, spanning demographic, health, and psychological outcomes show large and consistent improvements in prediction accuracies (e.g. from Pearson r=.73 to .82 for median income prediction or r=.37 to .47 for life satisfaction prediction) over the standard approach of aggregating all tweets. We make our aggregated and anonymized community-level data, derived from 37 billion tweets -- over 1 billion of which were mapped to counties, available for research.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/28/2018

Residualized Factor Adaptation for Community Social Media Prediction Tasks

Predictive models over social media language have shown promise in captu...
research
06/13/2019

Correlating Twitter Language with Community-Level Health Outcomes

We study how language on social media is linked to diseases such as athe...
research
11/28/2012

TwitterPaul: Extracting and Aggregating Twitter Predictions

This paper introduces TwitterPaul, a system designed to make use of Soci...
research
07/22/2017

"i have a feeling trump will win..................": Forecasting Winners and Losers from User Predictions on Twitter

Social media users often make explicit predictions about upcoming events...
research
12/10/2021

Recalibration of Predictive Models as Approximate Probabilistic Updates

The output of predictive models is routinely recalibrated by reconciling...
research
11/10/2019

Correcting Sociodemographic Selection Biases for Accurate Population Prediction from Social Media

Social media is increasingly used for large-scale population predictions...
research
11/26/2020

Towards real-time population estimates: introducing Twitter daily estimates of residents and non-residents at the county level

The study of migrations and mobility has historically been severely limi...

Please sign up or login with your details

Forgot password? Click here to reset