Batch Clipping and Adaptive Layerwise Clipping for Differential Private Stochastic Gradient Descent

07/21/2023
by   Toan N. Nguyen, et al.
0

Each round in Differential Private Stochastic Gradient Descent (DPSGD) transmits a sum of clipped gradients obfuscated with Gaussian noise to a central server which uses this to update a global model which often represents a deep neural network. Since the clipped gradients are computed separately, which we call Individual Clipping (IC), deep neural networks like resnet-18 cannot use Batch Normalization Layers (BNL) which is a crucial component in deep neural networks for achieving a high accuracy. To utilize BNL, we introduce Batch Clipping (BC) where, instead of clipping single gradients as in the orginal DPSGD, we average and clip batches of gradients. Moreover, the model entries of different layers have different sensitivities to the added Gaussian noise. Therefore, Adaptive Layerwise Clipping methods (ALC), where each layer has its own adaptively finetuned clipping constant, have been introduced and studied, but so far without rigorous DP proofs. In this paper, we propose a new ALC and provide rigorous DP proofs for both BC and ALC. Experiments show that our modified DPSGD with BC and ALC for CIFAR-10 with resnet-18 converges while DPSGD with IC and ALC does not.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/19/2020

Robust Differentially Private Training of Deep Neural Networks

Differentially private stochastic gradient descent (DPSGD) is a variatio...
research
06/28/2019

Neural ODEs as the Deep Limit of ResNets with constant weights

In this paper we prove that, in the deep limit, the stochastic gradient ...
research
11/09/2022

Directional Privacy for Deep Learning

Differentially Private Stochastic Gradient Descent (DP-SGD) is a key met...
research
06/26/2020

On the Generalization Benefit of Noise in Stochastic Gradient Descent

It has long been argued that minibatch stochastic gradient descent can g...
research
07/22/2019

Channel Normalization in Convolutional Neural Network avoids Vanishing Gradients

Normalization layers are widely used in deep neural networks to stabiliz...
research
03/01/2022

Differentially private training of residual networks with scale normalisation

We investigate the optimal choice of replacement layer for Batch Normali...
research
06/05/2018

Stochastic Gradient Descent with Hyperbolic-Tangent Decay

Learning rate scheduler has been a critical issue in the deep neural net...

Please sign up or login with your details

Forgot password? Click here to reset