You are your Metadata: Identification and Obfuscation of Social Media Users using Metadata Information

03/27/2018
by   Beatrice Perez, et al.
0

Metadata are associated to most of the information we produce in our daily interactions and communication in the digital world. Yet, surprisingly, metadata are often still catergorized as non-sensitive. Indeed, in the past, researchers and practitioners have mainly focused on the problem of the identification of a user from the content of a message. In this paper, we use Twitter as a case study to quantify the uniqueness of the association between metadata and user identity and to understand the effectiveness of potential obfuscation strategies. More specifically, we analyze atomic fields in the metadata and systematically combine them in an effort to classify new tweets as belonging to an account using different machine learning algorithms of increasing complexity. We demonstrate that through the application of a supervised learning algorithm, we are able to identify any user in a group of 10,000 with approximately 96.7 Moreover, if we broaden the scope of our search and consider the 10 most likely candidates we increase the accuracy of the model to 99.22 data obfuscation is hard and ineffective for this type of data: even after perturbing 60 with an accuracy higher than 95 terms of the design of metadata obfuscation strategies, for example for data set release, not only for Twitter, but, more generally, for most social media platforms.

READ FULL TEXT
research
04/17/2022

A Psycho-linguistic Analysis of BitChute

In order to better support researchers, journalist, and practitioners in...
research
04/08/2021

It's All About The Cards: Sharing on Social Media Probably Encouraged HTML Metadata Growth

In a perfect world, all articles consistently contain sufficient metadat...
research
10/07/2018

Geocoding Without Geotags: A Text-based Approach for reddit

In this paper, we introduce the first geolocation inference approach for...
research
09/26/2017

A Longitudinal Assessment of the Persistence of Twitter Datasets

Sharing of social media datasets presents the caveat that they are not a...
research
07/29/2023

Analyzing Cryptocurrency trends using Tweet Sentiment Data and User Meta-Data

Cryptocurrency is a form of digital currency using cryptographic techniq...
research
08/19/2021

A Multi-input Multi-output Transformer-based Hybrid Neural Network for Multi-class Privacy Disclosure Detection

The concern regarding users' data privacy has risen to its highest level...
research
09/06/2019

Full-text Search for Verifiable Credential Metadata on Distributed Ledgers

Self-sovereign Identity (SSI) powered by distributed ledger technologies...

Please sign up or login with your details

Forgot password? Click here to reset