NeuraCrypt: Hiding Private Health Data via Random Neural Networks for Public Training

06/04/2021
by   Adam Yala, et al.
101

Balancing the needs of data privacy and predictive utility is a central challenge for machine learning in healthcare. In particular, privacy concerns have led to a dearth of public datasets, complicated the construction of multi-hospital cohorts and limited the utilization of external machine learning resources. To remedy this, new methods are required to enable data owners, such as hospitals, to share their datasets publicly, while preserving both patient privacy and modeling utility. We propose NeuraCrypt, a private encoding scheme based on random deep neural networks. NeuraCrypt encodes raw patient data using a randomly constructed neural network known only to the data-owner, and publishes both the encoded data and associated labels publicly. From a theoretical perspective, we demonstrate that sampling from a sufficiently rich family of encoding functions offers a well-defined and meaningful notion of privacy against a computationally unbounded adversary with full knowledge of the underlying data-distribution. We propose to approximate this family of encoding functions through random deep neural networks. Empirically, we demonstrate the robustness of our encoding to a suite of adversarial attacks and show that NeuraCrypt achieves competitive accuracy to non-private baselines on a variety of x-ray tasks. Moreover, we demonstrate that multiple hospitals, using independent private encoders, can collaborate to train improved x-ray models. Finally, we release a challenge dataset to encourage the development of new attacks on NeuraCrypt.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/28/2020

Private Dataset Generation Using Privacy Preserving Collaborative Learning

With increasing usage of deep learning algorithms in many application, n...
research
07/17/2018

Efficient Deep Learning on Multi-Source Private Data

Machine learning models benefit from large and diverse datasets. Using s...
research
02/14/2018

Learning Privacy Preserving Encodings through Adversarial Training

We present a framework to learn privacy-preserving encodings of images (...
research
09/04/2019

Big Data Intelligence Using Distributed Deep Neural Networks

Large amount of data is often required to train and deploy useful machin...
research
06/14/2022

Self-Supervised Pretraining for Differentially Private Learning

We demonstrate self-supervised pretraining (SSP) is a scalable solution ...
research
10/15/2019

MUTE: Data-Similarity Driven Multi-hot Target Encoding for Neural Network Design

Target encoding is an effective technique to deliver better performance ...
research
04/28/2015

Private Disclosure of Information in Health Tele-monitoring

We present a novel framework, called Private Disclosure of Information (...

Please sign up or login with your details

Forgot password? Click here to reset