Geocoding Without Geotags: A Text-based Approach for reddit

10/07/2018
by   Keith Harrigian, et al.
0

In this paper, we introduce the first geolocation inference approach for reddit, a social media platform where user pseudonymity has thus far made supervised demographic inference difficult to implement and validate. In particular, we design a text-based heuristic schema to generate ground truth location labels for reddit users in the absence of explicitly geotagged data. After evaluating the accuracy of our labeling procedure, we train and test several geolocation inference models across our reddit data set and three benchmark Twitter geolocation data sets. Ultimately, we show that geolocation models trained and applied on the same domain substantially outperform models attempting to transfer training data across domains, even more so on reddit where platform-specific interest-group metadata can be used to improve inferences.

READ FULL TEXT
research
02/23/2017

A Probabilistic Framework for Location Inference from Social Media

We study the extent to which we can infer users' geographical locations ...
research
03/27/2018

You are your Metadata: Identification and Obfuscation of Social Media Users using Metadata Information

Metadata are associated to most of the information we produce in our dai...
research
02/01/2023

You are a Bot! – Studying the Development of Bot Accusations on Twitter

The characterization and detection of social bots with their presumed ab...
research
07/31/2023

DoDo Learning: DOmain-DemOgraphic Transfer in Language Models for Detecting Abuse Targeted at Public Figures

Public figures receive a disproportionate amount of abuse on social medi...
research
09/13/2017

Co-training for Demographic Classification Using Deep Learning from Label Proportions

Deep learning algorithms have recently produced state-of-the-art accurac...
research
05/15/2019

Demographic Inference and Representative Population Estimates from Multilingual Social Media Data

Social media provide access to behavioural data at an unprecedented scal...
research
11/05/2020

Evaluating the Performance of Twitter-based Exploit Detectors

Patch prioritization is a crucial aspect of information systems security...

Please sign up or login with your details

Forgot password? Click here to reset