Protecting Privacy and Transforming COVID-19 Case Surveillance Datasets for Public Use

01/13/2021
by   Brian Lee, et al.
0

Objectives: Federal open data initiatives that promote increased sharing of federally collected data are important for transparency, data quality, trust, and relationships with the public and state, tribal, local, and territorial (STLT) partners. These initiatives advance understanding of health conditions and diseases by providing data to more researchers, scientists, and policymakers for analysis, collaboration, and valuable use outside CDC responders. This is particularly true for emerging conditions such as COVID-19 where we have much to learn and have evolving data needs. Since the beginning of the outbreak, CDC has collected person-level, de-identified data from jurisdictions and currently has over 8 million records, increasing each day. This paper describes how CDC designed and produces two de-identified public datasets from these collected data. Materials and Methods: Data elements were included based on the usefulness, public request, and privacy implications; specific field values were suppressed to reduce risk of reidentification and exposure of confidential information. Datasets were created and verified for privacy and confidentiality using data management platform analytic tools as well as R scripts. Results: Unrestricted data are available to the public through Data.CDC.gov and restricted data, with additional fields, are available with a data use agreement through a private repository on GitHub.com. Practice Implications: Enriched understanding of the available public data, the methods used to create these data, and the algorithms used to protect privacy of de-identified individuals allow for improved data use. Automating data generation procedures allows greater and more timely sharing of data.

READ FULL TEXT
research
06/18/2021

Privacy-preserving Publication and Sharing of COVID-19 Pandemic Data

A huge amount of data of various types are collected during the COVID-19...
research
11/29/2021

(SARS-CoV-2) COVID 19: Genomic surveillance and evaluation of the impact on the population speaker of indigenous language in Mexico

The importance of the working document is that it allows the analysis of...
research
01/25/2021

Privacy Preserving Techniques Applied to CPNI Data: Analysis and Recommendations

With mobile phone penetration rates reaching 90 Network Information (CPN...
research
05/22/2019

The tradeoff between the utility and risk of location data and implications for public good

High-resolution individual geolocation data passively collected from mob...
research
04/25/2022

Optimal Control Measures Based on the Reconstruction of the COVID-19 Interlocalilty Transmission Network in Lebanon

In this paper, we study the evolution of COVID-19 in Lebanon using the d...
research
06/02/2016

Mobile phone data for public health: towards data-sharing solutions that protect individual privacy and national security

We outline the constraints faced by operators when deciding to share de-...

Please sign up or login with your details

Forgot password? Click here to reset