De-Health: All Your Online Health Information Are Belong to Us

02/02/2019
by   Shouling Ji, et al.
0

In this paper, we study the privacy of online health data. We present a novel online health data De-Anonymization (DA) framework, named De-Health. De-Health consists of two phases: Top-K DA, which identifies a candidate set for each anonymized user, and refined DA, which de-anonymizes an anonymized user to a user in its candidate set. By employing both candidate selection and DA verification schemes, De-Health significantly reduces the DA space by several orders of magnitude while achieving promising DA accuracy. Leveraging two real world online health datasets WebMD (89,393 users, 506K posts) and HealthBoards (388,398 users, 4.7M posts), we validate the efficacy of De-Health. Further, when the training data are insufficient, De-Health can still successfully de-anonymize a large portion of anonymized users. We develop the first analytical framework on the soundness and effectiveness of online health data DA. By analyzing the impact of various data features on the anonymity, we derive the conditions and probabilities for successfully de-anonymizing one user or a group of users in exact DA and Top-K DA. Our analysis is meaningful to both researchers and policy makers in facilitating the development of more effective anonymization techniques and proper privacy polices. We present a linkage attack framework which can link online health/medical information to real world people. Through a proof-of-concept attack, we link 347 out of 2805 WebMD users to real world people, and find the full names, medical/health information, birthdates, phone numbers, and other sensitive information for most of the re-identified users. This clearly illustrates the fragility of the notion of privacy of those who use online health forums.

READ FULL TEXT

page 1

page 17

research
04/02/2016

Online Updating of Word Representations for Part-of-Speech Tagging

We propose online unsupervised domain adaptation (DA), which is performe...
research
09/12/2022

Design heuristics: privacy and portability Regulation as a feature request

The lack of user experience standards in regulations for data privacy an...
research
08/16/2023

Privacy at Risk: Exploiting Similarities in Health Data for Identity Inference

Smartwatches enable the efficient collection of health data that can be ...
research
04/10/2023

A visão da BBChain sobre o contexto tecnológico subjacente à adoção do Real Digital

We explore confidential computing in the context of CBDCs using Microsof...
research
12/15/2018

Verifying the Medical Specialty from User Profile of Online Community for Health-Related Advices

The paper describes the verifying methods of medical specialty from user...
research
04/16/2019

Beyond Technical Motives: Perceived User Behavior in Abandoning Wearable Health & Wellness Trackers

Health trackers are widely adopted to support individuals with daily hea...

Please sign up or login with your details

Forgot password? Click here to reset