Automatic de-identification of Data Download Packages

05/04/2021
by   Laura Boeschoten, et al.
0

The General Data Protection Regulation (GDPR) grants all natural persons the right of access to their personal data if this is being processed by data controllers. The data controllers are obliged to share the data in an electronic format and often provide the data in a so called Data Download Package (DDP). These DDPs contain all data collected by public and private entities during the course of citizens' digital life and form a treasure trove for social scientists. However, the data can be deeply private. To protect the privacy of research participants while using their DDPs for scientific research, we developed de-identification software that is able to handle typical characteristics of DDPs such as regularly changing file structures, visual and textual content, different file formats, different file structures and accounting for usernames. We investigate the performance of the software and illustrate how the software can be tailored towards specific DDP structures.

READ FULL TEXT

page 5

page 10

research
10/11/2021

Privacy preserving local analysis of digital trace data: A proof-of-concept

We present PORT, a software platform for local data extraction and analy...
research
11/13/2020

Digital trace data collection through data donation

A potentially powerful method of social-scientific data collection and i...
research
05/04/2020

GDPR: When the Right to Access Personal Data Becomes a Threat

After one year since the entry into force of the GDPR, all web sites and...
research
02/23/2018

An Empirical Study on README contents for JavaScript Packages

Contemporary software projects often utilize a README.md to share crucia...
research
08/31/2012

On Benchmarking Embedded Linux Flash File Systems

Due to its attractive characteristics in terms of performance, weight an...
research
05/24/2020

MASK: A flexible framework to facilitate de-identification of clinical texts

Medical health records and clinical summaries contain a vast amount of i...
research
11/02/2022

An Easy-to-use and Robust Approach for the Differentially Private De-Identification of Clinical Textual Documents

Unstructured textual data is at the heart of healthcare systems. For obv...

Please sign up or login with your details

Forgot password? Click here to reset