The handling and processing of sensitive data inherently entails the risk of compromising individual privacy and exposing personal information. The differential privacy (DP) framework dwork2014algorithmic allows to reason about privacy loss in the setting of data processing and provide privacy guarantees to individuals while allowing to draw conclusions from the dataset as a whole. However, certain attributes of the individual which serve as input variables, may contribute more to their identifiability in a dataset than others. Moreover, the individuals themselves may perceive some of their attributes as being more sensitive and thus more worthy of protection from exposure. Transparent and trustworthy data processing systems must therefore be capable of not only accounting for individual privacy loss (e.g. shown in work on individual Rényi DP (RDP) feldman2020individual), but to also selectively apply privacy-preserving mechanisms to specific private attributes alaggan2015heterogeneous. So far, few works have investigated the contribution of individual attributes to the query function’s sensitivity, and therefore to overall privacy loss. This is, in part, because in existent machine learning frameworks, it is impossible to disentangle the interaction between input features and model components such as weights. To address this issue, recent work has introduced symbolic automatic differentiation (AD)-based sensitivity analysis ziller2021sensitivity
. The main contribution of our work is the introduction of partial sensitivity, representing the fractional contribution of each input feature to the gradient norm, which determines privacy loss in DP, and can be derived using symbolic AD. We demonstrate this technique in the contexts of queries over private databases and neural network training on synthetic data. Moreover, we showcase the markedly different influence of non-private stochastic gradient descent (SGD) on partial sensitivity compared to DP-SGD.
2 Related Work
Typically, DP guarantees are allotted homogeneously across individuals in the dataset, only considering the worst case scenario (i.e. where privacy accounting is carried out using global sensitivity). As, however, the realised gradient norm of a query may be considerably lower, the outlook of this type of privacy accounting may be unnecessarily pessimistic. Individual privacy accounting, and specifically the work by Feldman et al. feldman2020individual, proposes an alternative formulation which separately accounts for each individual’s influence on the outcome of a computation via the actual gradient norm. The authors express privacy guarantees using Rényi DP mironov2017renyi. We note that the utilisation of the gradient norm for individual privacy accounting, which we will be using in this work, is performed in conjunction with a privacy filter or odometer presented in feldman2020individual. Heterogeneous DP alaggan2015heterogeneous captures the variation of privacy expectations across information held by one individual and uses a modified Laplace mechanism. Phan et al. Phan2019; Phan2017 address the distribution of noise between different input variables depending on their importance to the network output using the Laplace or the Heterogeneous Gaussian Mechanism. Our work is complementary to these techniques as partial sensitivity captures the influence of individual input attributes on the gradient norm and can therefore be employed to guide noise addition to selectively perturb them using one of these heterogeneous noise mechanisms.
We briefly introduce key terminology, and assume familiarity with the concepts of differential privacy, the Gaussian mechanism dwork2014algorithmic and (DP-)SGD abadi2016deep. We will use the following notations: and for two adjacent datasets, whereby denotes add/remove one adjacency. We will use for a function/query and for its -sensitivity.
Definition 1 (-sensitivity).
The -sensitivity of a function is defined as the maximum change in -norm in the image of over all pairs of adjacent inputs:
The -sensitivity is used to calibrate the noise addition in the Gaussian Mechanism. If is Lipschitz continuous, its Lipschitz constant is equivalent to its sensitivity raskhodnikova2016lipschitz:
Definition 2 (Lipschitz continuity).
Let be a function with associated metrics and . is Lipschitz continuous if and only if there exists a real constant such that, :
The Lipschitz constant of a real-valued function is equal to the supremum of the norm of its gradient: .
4 Theoretical Results
Definition 3 (Partial sensitivity of a function).
Let be a real-valued function. The partial sensitivity of at an input is defined as the gradient of the gradient norm of with respect to the input variables:
The definition of partial sensitivity is derived from the following property: Let
be the vector of inputs. Since, is a scalar. As such, . Furthermore, let and let be the -norm of , given by which is also a scalar. As we are interested in the rate of change of the sensitivity with respect to the individual input attributes, , we obtain the gradient of :
The component of , , where is given by:
Partial sensitivity therefore represents the fractional contribution of the individual input attributes to the gradient norm of the function. Its symbolic representation, which can be easily obtained using a symbolic AD system ziller2021sensitivity, is independent of the actual input data and can therefore be used to interpret the impact of individual input attributes on the gradient norm, used for (individual) privacy accounting, as shown in the next section.
5.1 Partial sensitivity analysis of multivariate database queries
We assume a scenario where an analyst wants to construct a statistical query over a database containing private data from a population. They specify reasonable ranges on the private attributes (e.g. based on prior knowledge but not on the actual data values). Let and be a linear aggregation function with sensitivity . To obtain the Rényi DP guarantee for one individual as specified in Section 2, the Lipschitz constant of and the norm of the gradient of of each individual , , are required. Following Definition 2.7 from feldman2020individual and Definition 2, the linearity of means that the same Lipschitz constant can be used for all individuals. The privacy loss of individual can then be calculated as follows: , where is the Rényi divergence order and
the variance of the Gaussian noise added by the mechanism. Furthermore, compiling the symbolic expression for the gradient norm ofand maximising its value using a suitable technique such as simplicial homology global optimisation endres2018simplicial, the global sensitivity can be calculated given the pre-defined ranges of the input variables and . This has to be done data-independently, and is a constant (in this example ). Finally, the partial sensitivity of , , can be analysed and plotted to permit visual reasoning about the effects of and on , and thus on Rényi DP guarantees. Figure 1 shows how the two variables contribute to the sensitivity of . This knowledge allows to assign a different weight to each attribute and calibrate noise addition to selectively protect specific attributes.
5.2 Partial sensitivity modelling in (DP-)SGD
In this section, we investigate the impact of individual input attributes on the privacy loss of a specific neural network training example. To illustrate this effect, we performed binary classification on synthetic
-pixel images of vertical and horizontal bars using SGD and DP-SGD using a two-layer neural network. The symbolic representation for the partial sensitivity of the loss function,
can again be derived through symbolic AD. By substitution of the weights, class label and input pixel values, it is possible to obtain the partial sensitivity of the input pixels at any moment in training. At the end of training, where we consider the weights to be fixed, this can be used to reason over the relevant input features leading the network to assign an image to one of the two classes. For the vertical bar class, the maximum partial sensitivity values were observed across the horizontal of the image, and vice versa for the horizontal bar class (Figure2A
). Evidently, the presence of features characteristic for the opposite class renders the individual an outlier in the specific distribution, resulting in a large change in the function’s gradient norm and thus, a high individual privacy loss. Interestingly, training with DP-SGD led to a substantial reduction in the partial sensitivities of the corresponding pixels, while increasing the values of unrelated pixels in the input space. To further illustrate this effect, we generated histograms of the partial sensitivities for each pixel over one thousand samples. We found the distributions of the partial sensitivities to be highly concentrated around specific values in the case of SGD, while being substantially more dispersed and centered around zero for DP-SGD (Figure2B). From this we conjecture that the effect of DP-SGD training is the homogenisation of the partial sensitivity across the space of inputs, which may suppress the memorisation of strongly identifying private attributes. 111Source code at: https://github.com/tamaramueller/Deuterium_Partial_Sensitivity
6 Discussion and Conclusions
Our work extends sensitivity analysis to the level of individual input attributes by introducing partial sensitivity. Individuals may wish to weight their specific attributes in a personalised manner as they consider them to have a different privacy impact, or even allocate fractions of their privacy budget to different analysts based on such feature-level granularity. Analysts on the other hand, may be interested to attenuate the specific features leading to high privacy expenditure, as demonstrated in our case study on DP-SGD. Our results may help to better understand the privacy-utility trade-offs of this algorithm dp_impact_accuracy; dp_bad_2; avent2019automatic, its impact on fairness fioretto2021decision, and the link between memorisation of private features, model overfitting, and generalisation feldman2020does. Moreover, partial sensitivity exhibits strong links to gradient-based model introspection techniques gradcam. We intend to explore these topics, and the utilisation of partial sensitivity to guide targeted noise addition via heterogeneous mechanisms, in future work.
Appendix A Appendix
Description of computational resources used
All experimentation was conducted on a single workstation computer with 18 physical CPU cores and 512 GB of physical memory, as well as 2 NVidia Quadro RTX 8000 GPUs running Ubuntu Linux v. 20.04 LTS.
For the partial sensitivity modelling in DP-SGD, we designed a feedforward neural network consisting of an input layer of dimensionality , an intermediate layer of dimensionality
with a bias vector of lengthand an output layer of dimensionality . All layers were followed by logistic sigmoidactivation functions. Training was performed using the binary cross-entropy loss function.
Synthetic data generation
Synthetic data generation was conducted as follows:
Two base images with white background were constructed, where one consisted of a vertical bar of black pixels and the other of a horizontal line. From these,
images per class were created by the addition of random Gaussian noise with mean zero and a standard deviation of. Evaluation was performed on a test set of images per class. A fixed random seed was used to deterministically generate the images in the experiments of the main manuscript.
We trained all networks to convergence using SGD with a batch size of (batch gradient descent) and a learning rate . We repeated this procedure using DP-SGD with the same learning rate, a noise multiplier of and an -bound of .
Details on the attached code
Symbolic automatic differentiation was performed using the Deuterium framework [ziller2021sensitivity], whose source code is available alongside the experiment code. Compilation performed by Deuterium relies on a suitable compiler for the C or Fortran programming language or for the LLVM tool-chain.