Privacy-Preserving Classification with Secret Vector Machines

07/08/2019
by   Valentin Hartmann, et al.
2

Today, large amounts of valuable data are distributed among millions of user-held devices, such as personal computers, phones, or Internet-of-things devices. Many companies collect such data with the goal of using it for training machine learning models allowing them to improve their services. However, user-held data is often sensitive, and collecting it is problematic in terms of privacy. We address this issue by proposing a novel way of training a supervised classifier in a distributed setting akin to the recently proposed federated learning paradigm (McMahan et al. 2017), but under the stricter privacy requirement that the server that trains the model is assumed to be untrusted and potentially malicious; we thus preserve user privacy by design, rather than by trust. In particular, our framework, called secret vector machine (SecVM), provides an algorithm for training linear support vector machines (SVM) in a setting in which data-holding clients communicate with an untrusted server by exchanging messages designed to not reveal any personally identifiable information. We evaluate our model in two ways. First, in an offline evaluation, we train SecVM to predict user gender from tweets, showing that we can preserve user privacy without sacrificing classification performance. Second, we implement SecVM's distributed framework for the Cliqz web browser and deploy it for predicting user gender in a large-scale online evaluation with thousands of clients, outperforming baselines by a large margin and thus showcasing that SecVM is practicable in production environments. Overall, this work demonstrates the feasibility of machine learning on data from thousands of users without collecting any personal data. We believe this is an innovative approach that will help reconcile machine learning with data privacy.

READ FULL TEXT

page 6

page 7

research
04/24/2020

A Review of Privacy Preserving Federated Learning for Private IoT Analytics

The Internet-of-Things generates vast quantities of data, much of it att...
research
06/27/2019

Privacy-Preserving Distributed Learning with Secret Gradient Descent

In many important application domains of machine learning, data is a pri...
research
01/29/2019

Federated Collaborative Filtering for Privacy-Preserving Personalized Recommendation System

The increasing interest in user privacy is leading to new privacy preser...
research
04/25/2020

Privacy Preserving Distributed Machine Learning with Federated Learning

Edge computing and distributed machine learning have advanced to a level...
research
11/25/2020

Advancements of federated learning towards privacy preservation: from federated learning to split learning

In the distributed collaborative machine learning (DCML) paradigm, feder...
research
06/05/2023

A Privacy-Preserving Federated Learning Approach for Kernel methods

It is challenging to implement Kernel methods, if the data sources are d...

Please sign up or login with your details

Forgot password? Click here to reset