DataLens: Scalable Privacy Preserving Training via Gradient Compression and Aggregation

by   Boxin Wang, et al.

Recent success of deep neural networks (DNNs) hinges on the availability of large-scale dataset; however, training on such dataset often poses privacy risks for sensitive training information. In this paper, we aim to explore the power of generative models and gradient sparsity, and propose a scalable privacy-preserving generative model DATALENS. Comparing with the standard PATE privacy-preserving framework which allows teachers to vote on one-dimensional predictions, voting on the high dimensional gradient vectors is challenging in terms of privacy preservation. As dimension reduction techniques are required, we need to navigate a delicate tradeoff space between (1) the improvement of privacy preservation and (2) the slowdown of SGD convergence. To tackle this, we take advantage of communication efficient learning and propose a novel noise compression and aggregation approach TOPAGG by combining top-k compression for dimension reduction with a corresponding noise injection mechanism. We theoretically prove that the DATALENS framework guarantees differential privacy for its generated data, and provide analysis on its convergence. To demonstrate the practical usage of DATALENS, we conduct extensive experiments on diverse datasets including MNIST, Fashion-MNIST, and high dimensional CelebA, and we show that, DATALENS significantly outperforms other baseline DP generative models. In addition, we adapt the proposed TOPAGG approach, which is one of the key building blocks in DATALENS, to DP SGD training, and show that it is able to achieve higher utility than the state-of-the-art DP SGD approach in most cases.


A New Dimensionality Reduction Method Based on Hensel's Compression for Privacy Protection in Federated Learning

Differential privacy (DP) is considered a de-facto standard for protecti...

DP-FP: Differentially Private Forward Propagation for Large Models

When applied to large-scale learning problems, the conventional wisdom o...

AutoGAN-based Dimension Reduction for Privacy Preservation

Exploiting data and concurrently protecting sensitive information to who...

Towards Understanding the Impact of Model Size on Differential Private Classification

Differential privacy (DP) is an essential technique for privacy-preservi...

P3GM: Private High-Dimensional Data Release via Privacy Preserving Phased Generative Model

How can we release a massive volume of sensitive data while mitigating p...

When Homomorphic Cryptosystem Meets Differential Privacy: Training Machine Learning Classifier with Privacy Protection

Machine learning (ML) classifiers are invaluable building blocks that ha...

P3SGD: Patient Privacy Preserving SGD for Regularizing Deep CNNs in Pathological Image Classification

Recently, deep convolutional neural networks (CNNs) have achieved great ...