SEDML: Securely and Efficiently Harnessing Distributed Knowledge in Machine Learning

10/26/2021
by   Yansong Gao, et al.
0

Training high-performing deep learning models require a rich amount of data which is usually distributed among multiple data sources in practice. Simply centralizing these multi-sourced data for training would raise critical security and privacy concerns, and might be prohibited given the increasingly strict data regulations. To resolve the tension between privacy and data utilization in distributed learning, a machine learning framework called private aggregation of teacher ensembles(PATE) has been recently proposed. PATE harnesses the knowledge (label predictions for an unlabeled dataset) from distributed teacher models to train a student model, obviating access to distributed datasets. Despite being enticing, PATE does not offer protection for the individual label predictions from teacher models, which still entails privacy risks. In this paper, we propose SEDML, a new protocol which allows to securely and efficiently harness the distributed knowledge in machine learning. SEDML builds on lightweight cryptography and provides strong protection for the individual label predictions, as well as differential privacy guarantees on the aggregation results. Extensive evaluations show that while providing privacy protection, SEDML preserves the accuracy as in the plaintext baseline. Meanwhile, SEDML's performance in computing and communication is 43 times and 1.23 times higher than the latest technology, respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/17/2019

Not Just Cloud Privacy: Protecting Client Privacy in Teacher-Student Learning

Ensuring the privacy of sensitive data used to train modern machine lear...
research
02/24/2018

Scalable Private Learning with PATE

The rapid adoption of machine learning has increased concerns about the ...
research
06/21/2019

Scalable Differentially Private Generative Student Model via PATE

Recent rapid development of machine learning is largely due to algorithm...
research
03/27/2018

Hiding in the Crowd: A Massively Distributed Algorithm for Private Averaging with Malicious Adversaries

The amount of personal data collected in our everyday interactions with ...
research
10/18/2016

Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data

Some machine learning applications involve training data that is sensiti...
research
02/07/2022

Locally Differentially Private Distributed Deep Learning via Knowledge Distillation

Deep learning often requires a large amount of data. In real-world appli...
research
12/04/2020

ESCAPED: Efficient Secure and Private Dot Product Framework for Kernel-based Machine Learning Algorithms with Applications in Healthcare

To train sophisticated machine learning models one usually needs many tr...

Please sign up or login with your details

Forgot password? Click here to reset