Where You Are Is Who You Are: User Identification by Matching Statistics

12/09/2015
by   Farid M. Naini, et al.
0

Most users of online services have unique behavioral or usage patterns. These behavioral patterns can be exploited to identify and track users by using only the observed patterns in the behavior. We study the task of identifying users from statistics of their behavioral patterns. Specifically, we focus on the setting in which we are given histograms of users' data collected during two different experiments. We assume that, in the first dataset, the users' identities are anonymized or hidden and that, in the second dataset, their identities are known. We study the task of identifying the users by matching the histograms of their data in the first dataset with the histograms from the second dataset. In recent works, the optimal algorithm for this user identification task is introduced. In this paper, we evaluate the effectiveness of this method on three different types of datasets and in multiple scenarios. Using datasets such as call data records, web browsing histories, and GPS trajectories, we show that a large fraction of users can be easily identified given only histograms of their data; hence these histograms can act as users' fingerprints. We also verify that simultaneous identification of users achieves better performance compared to one-by-one user identification. We show that using the optimal method for identification gives higher identification accuracy than heuristics-based approaches in practical scenarios. The accuracy obtained under this optimal method can thus be used to quantify the maximum level of user identification that is possible in such settings. We show that the key factors affecting the accuracy of the optimal identification algorithm are the duration of the data collection, the number of users in the anonymized dataset, and the resolution of the dataset. We analyze the effectiveness of k-anonymization in resisting user identification attacks on these datasets.

READ FULL TEXT

page 1

page 3

page 7

page 8

page 9

page 11

research
11/07/2017

Sequential Keystroke Behavioral Biometrics for Mobile User Identification via Multi-view Deep Learning

With the rapid growth in smartphone usage, more organizations begin to f...
research
08/04/2023

Who Is Alyx? A new Behavioral Biometric Dataset for User Identification in XR

This article presents a new dataset containing motion and physiological ...
research
01/21/2022

AI-based Re-identification of Behavioral Clickstream Data

AI-based face recognition, i.e., the re-identification of individuals wi...
research
06/10/2018

Temporal Limits of Privacy in Human Behavior

Large-scale collection of human behavioral data by companies raises seri...
research
10/05/2022

A novel non-linear transformation based multi-user identification algorithm for fixed text keystroke behavioral dynamics

In this paper, we propose a new technique to uniquely classify and ident...
research
06/22/2021

User Identification across Social Networking Sites using User Profiles and Posting Patterns

With the prevalence of online social networking sites (OSNs) and mobile ...
research
01/17/2022

PerPaDa: A Persian Paraphrase Dataset based on Implicit Crowdsourcing Data Collection

In this paper we introduce PerPaDa, a Persian paraphrase dataset that is...

Please sign up or login with your details

Forgot password? Click here to reset