Privacy-Preserving Hierarchical Clustering: Formal Security and Efficient Approximation

04/09/2019
by   Xianrui Meng, et al.
0

Machine Learning (ML) is widely used for predictive tasks in a number of critical applications. Recently, collaborative or federated learning is a new paradigm that enables multiple parties to jointly learn ML models on their combined datasets. Yet, in most application domains, such as healthcare and security analytics, privacy risks limit entities to individually learning local models over the sensitive datasets they own. In this work, we present the first formal study for privacy-preserving collaborative hierarchical clustering, overall featuring scalable cryptographic protocols that allow two parties to privately compute joint clusters on their combined sensitive datasets. First, we provide a formal definition that balances accuracy and privacy, and we present a provably secure protocol along with an optimized version for single linkage clustering. Second, we explore the integration of our protocol with existing approximation algorithms for hierarchical clustering, resulting in a protocol that can efficiently scale to very large datasets. Finally, we provide a prototype implementation and experimentally evaluate the feasibility and efficiency of our approach on synthetic and real datasets, with encouraging results. For example, for a dataset of one million records and 10 dimensions, our optimized privacy-preserving approximation protocol requires 35 seconds for end-to-end execution, just 896KB of communication, and achieves 97.09 accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/05/2020

PrivFL: Practical Privacy-preserving Federated Regressions on High-dimensional Data over Mobile Networks

Federated Learning (FL) enables a large number of users to jointly learn...
research
11/14/2019

Enabling Efficient Privacy-Assured Outlier Detection over Encrypted Incremental Datasets

Outlier detection is widely used in practice to track the anomaly on inc...
research
11/29/2019

Incremental Clustering Techniques for Multi-Party Privacy-Preserving Record Linkage

Privacy-Preserving Record Linkage (PPRL) supports the integration of sen...
research
06/30/2022

Privacy-preserving Graph Analytics: Secure Generation and Federated Learning

Directly motivated by security-related applications from the Homeland Se...
research
12/21/2021

Distributed Machine Learning and the Semblance of Trust

The utilisation of large and diverse datasets for machine learning (ML) ...
research
08/11/2018

Privacy Preserving Multi-Server k-means Computation over Horizontally Partitioned Data

The k-means clustering is one of the most popular clustering algorithms ...
research
07/05/2023

Privacy-Preserving Federated Heavy Hitter Analytics for Non-IID Data

Federated heavy-hitter analytics involves the identification of the most...

Please sign up or login with your details

Forgot password? Click here to reset