Drynx: Decentralized, Secure, Verifiable System for Statistical Queries and Machine Learning on Distributed Datasets

02/11/2019
by   David Froelicher, et al.
0

Data sharing has become of primary importance in many domains such as big-data analytics, economics and medical research, but remains difficult to achieve when the data are sensitive. In fact, sharing personal information requires individuals' unconditional consent or is often simply forbidden for privacy and security reasons. In this paper, we propose Drynx, a decentralized system for privacy-conscious statistical analysis on distributed datasets. Drynx relies on a set of computing nodes to enable the computation of statistics such as standard deviation or extrema, and the training and evaluation of machine-learning models on sensitive and distributed data. To ensure data confidentiality and the privacy of the data providers, Drynx combines interactive protocols, homomorphic encryption, zero-knowledge proofs of correctness and differential privacy. It enables an efficient verification of the input data and of all the system's computations by relying on a public immutable distributed ledger. It provides auditability in a strong adversarial model in which no entity has to be individually trusted. Drynx is highly modular, dynamic and parallelizable. Our evaluation shows that Drynx enables the training of a logistic regression model on a dataset (8 features and 6000 records) distributed among 60 data providers in less than 1.1 seconds. The computations are distributed among 6 nodes and Drynx enables the verification of the query execution's correctness in less than 11 seconds.

READ FULL TEXT
research
05/19/2020

Scalable Privacy-Preserving Distributed Learning

In this paper, we address the problem of privacy-preserving distributed ...
research
12/23/2021

Mitigating Leakage from Data Dependent Communications in Decentralized Computing using Differential Privacy

Imagine a group of citizens willing to collectively contribute their per...
research
02/16/2023

Practically Efficient Secure Computation of Rank-based Statistics Over Distributed Datasets

In this paper, we propose a practically efficient model for securely com...
research
06/18/2021

Sharing in a Trustless World: Privacy-Preserving Data Analytics with Potentially Cheating Participants

Lack of trust between organisations and privacy concerns about their dat...
research
10/04/2018

Privacy-Preserving Multiparty Learning For Logistic Regression

In recent years, machine learning techniques are widely used in numerous...
research
01/17/2020

IPPO: A Privacy-Aware Architecture for Decentralized Data-sharing

Online trackers personalize ads campaigns, exponentially increasing thei...
research
11/09/2019

Analyzing Bias in Sensitive Personal Information Used to Train Financial Models

Bias in data can have unintended consequences that propagate to the desi...

Please sign up or login with your details

Forgot password? Click here to reset