Bounding Membership Inference

by   Anvith Thudi, et al.

Differential Privacy (DP) is the de facto standard for reasoning about the privacy guarantees of a training algorithm. Despite the empirical observation that DP reduces the vulnerability of models to existing membership inference (MI) attacks, a theoretical underpinning as to why this is the case is largely missing in the literature. In practice, this means that models need to be trained with DP guarantees that greatly decrease their accuracy. In this paper, we provide a tighter bound on the accuracy of any MI adversary when a training algorithm provides ϵ-DP. Our bound informs the design of a novel privacy amplification scheme, where an effective training set is sub-sampled from a larger set prior to the beginning of training, to greatly reduce the bound on MI accuracy. As a result, our scheme enables ϵ-DP users to employ looser DP guarantees when training their model to limit the success of any MI adversary; this ensures that the model's accuracy is less impacted by the privacy guarantee. Finally, we discuss implications of our MI bound on the field of machine unlearning.


page 1

page 2

page 3

page 4


Bounding Training Data Reconstruction in Private (Deep) Learning

Differential privacy is widely accepted as the de facto method for preve...

Differentially Private Learning Does Not Bound Membership Inference

Training machine learning models on privacy-sensitive data has become a ...

DTGAN: Differential Private Training for Tabular GANs

Tabular generative adversarial networks (TGAN) have recently emerged to ...

Improving Deep Learning with Differential Privacy using Gradient Encoding and Denoising

Deep learning models leak significant amounts of information about their...

Lifelong DP: Consistently Bounded Differential Privacy in Lifelong Machine Learning

In this paper, we show that the process of continually learning new task...

Privacy Budget Scheduling

Machine learning (ML) models trained on personal data have been shown to...

Monitoring-based Differential Privacy Mechanism Against Query-Flooding Parameter Duplication Attack

Public intelligent services enabled by machine learning algorithms are v...

1 Introduction

Differential Privacy (DP) (Dwork et al., 2006) is employed extensively to reason about privacy guarantees in a variety of different settings (Dwork, 2008)

. Recently, DP started being used to give privacy guarantees for the training data of deep neural networks (DNNs) learned through stochastic gradient descent (SGD)

(Abadi et al., 2016). However, even though DP provides privacy guarantees and bounds the worst case privacy leakage, it is not immediately clear how these guarantees bound the accuracy of known existing forms of privacy infringement attacks.

At the time of writing, the most practical attack on the privacy of DNNs is Membership Inference (MI) (Shokri et al., 2017), where an attacker predicts whether or not a model used a particular data point for training; note that this is quite similar to the hypothetical adversary at the core of the game instantiated in the definition of DP. MI attacks saw a strong interest by the community, and several improvements and renditions were proposed since its inception (Sablayrolles et al., 2019; Choquette-Choo et al., 2021; Maini et al., 2021; Hu et al., 2021). Having privacy guarantees would desirably defend against MI, and in fact current literature highlights that DP does indeed give an upper bound on the performance of MI adversaries (Yeom et al., 2018; Erlingsson et al., 2019; Sablayrolles et al., 2019; Jayaraman et al., 2020).

In this paper, we first propose a tighter bound on MI accuracy for training algorithms that provide -DP. The key insight is that an MI adversary only ever has a finite set of points they suspect were used during training. This allows us to use a counting lemma to simplify terms in the equation for MI positive accuracy. Furthermore, in obtaining our bound, we also show how this bound can benefit from a form of privacy amplification where the training dataset itself is sub-sampled from a larger dataset. Amplification is a technique pervasively found in work improving the analysis of DP learners like DP-SGD (Abadi et al., 2016), and we observe the effect of our amplification on lowering MI accuracy is significantly stronger than the effect of batch sampling, a common privacy amplification scheme for training DNNs.

Our bound also has consequences for the problem of unlearning (or data forgetting in ML) introduced by Cao and Yang (2015). In particular the MI accuracy on the point to be unlearned is a popular measure for how well a model has unlearned it (Baumhauer et al., 2020; Graves et al., 2020; Golatkar et al., 2020b, a)

. However empirical verification of the MI accuracy can be open ended, as it is subjective to the attack employed. Theoretical bounds on all MI attacks, such as the one proposed in this work, circumvent this issue; a bound on the accuracy of MI attacks, in particular the probability a data point was used in the training dataset, indicates a limitation for any entity to discern if the model had trained on the data point. In the case when this probability is sufficiently low (where sufficiently is defined apriori), one can then claim to have unlearned by achieving a model sufficiently likely to have not come from training with the data point. Our analysis shows that, if dataset sub-sampling is used, one can unlearn under this definition by training with a relatively large

-DP (and thus have less cost to performance).

To summarize, our contributions are:

  • We present a tighter general bound on MI accuracy for -DP;

  • We further demonstrate how to lower this bound using a novel privacy amplification scheme built on dataset subsampling;

  • We discuss the benefits of such bounds to machine unlearning as a rigorous way to use MI as a metric for unlearning.

2 Background

This section provides background on DP, MI attacks, and previous bounds on MI.

2.1 Differential Privacy

Differential privacy (DP) (Dwork et al., 2006) bounds how different the outputs of a function on adjacent inputs can be in order to provide privacy guarantees for the inputs. More formally, a function is -DP if for all adjacent inputs and (i.e. inputs with Hamming distance of 1) we have for all sets in the output space:


There also is the more relaxed notion of -DP which is defined for the same setup as above, but introduces a parameter , such that . Notably, -DP is used for functions where it is more natural to work with metrics on the input space, which has to do with how DP guarantees are obtained.

To achieve DP guarantees, one usually introduces noise to the output of the function . The amount of noise is calibrated to the maximal or difference between all possible outputs of the function on adjacent datasets (also called sensitivity). Significant progress was achieved on minimizing the amount of noise needed for a given sensitivity (Balle and Wang, 2018), and on how DP guarantees scale when composing multiple DP functions (Dwork et al., 2010; Kairouz et al., 2015).

Abadi et al. (2016); Song et al. (2013); Bassily et al. (2014) demonstrated a method to make the final model returned by mini-batch SGD -DP with respect to its training dataset by bounding the sensitivity of gradient updates during mini-batch SGD and introducing Gaussian noise to each update. This approach became the de-facto standard for DP guarantees in DNNs. However, the adoption is still greatly limited because of an observed trade-off between privacy guarantees and model utility. At the time of writing there is still no feasible way to learn with low and high accuracy; what is more, past work (Jagielski et al., 2020) observed a gap between observed and claimed privacy, which suggests that DP-analysis, so far, may be too loose.

However, more recently Nasr et al. (2021) showed (using statistical tests and stronger MI adversaries) that the current state of the art methods of achieving

-DP guarantees for deep learning are tight, in contrast to the gap

Jagielski et al. (2020) observed. This suggests that there is not much more improvement to be gained by studying how to improve the -DP guarantee from a given amount of noise or improving composition rules. Facing this, future improvements in DP training would lie in understanding the guarantees that DP provides against the performance of relevant privacy attacks. This would allow us to be more informed about the guarantees required during training to defeat practical attacks and enable the use of looser guarantees if one is only interested in defending against a specific set of attacks111It is worth noting that Nasr et al. (2021) showed that current analytic upper bounds on DP guarantees are tight, measuring them empirically with various strong privacy adversaries. Although results do suggest that bounds match, the paper did not investigate how DP guarantees limit performance of the adversary..

2.2 Membership Inference

Shokri et al. (2017)

introduced a MI attack against DNNs, which leveraged shadow models (models with the same architecture as the target model) trained on similar data in order to train a classifier which, given the outputs of a model on a data point, predicts if the model was trained on that data point or not. Since the introduction of this initial attack, the community has proposed several improved and variations of the original MI attack

(Yeom et al., 2018; Salem et al., 2018; Sablayrolles et al., 2019; Truex et al., 2019; Jayaraman et al., 2020; Maini et al., 2021; Choquette-Choo et al., 2021).

Nasr et al. (2021) demonstrated that current analysis of DP can not be significantly improved, and further privacy improvements will be in understanding how DP bounds privacy attacks. MI attacks are currently the main practical threat to the privacy of the data used to train DNNs. Hence, tighter bounds on MI attacks when using DP would help entities reason about what DP parameters to train with; this is particularly relevant when stronger DP guarantees are known to also reduce performance and fairness (Bagdasaryan et al., 2019).

In our paper, we will work with the following (abstract) notion of MI in giving our bounds. In particular we define our MI adversary as a function which takes a set of models and a data point to deterministically output either or , corresponding to whether was in the dataset used to train the models in or not. Note the generality of this adversary, as we do not consider how the adversary obtains the function, i.e. the adversary can have arbitrary strength. Any such adversary will then satisfy the upper and lower bounds we derive in the paper.

With this definition of our adversary, we have the following definition of positive and negative MI accuracy (which are what we focus on in this paper):

Definition 1 (MI accuracy).

The positive accuracy of , the accuracy if outputs 1 which we define as , is , and the negative accuracy, the accuracy if outputs 0 which we define as , is .

where is the training dataset and is the probability a data point was in the training dataset used to obtain the models in the set (i.e. this is a probability over datasets). We explain more about where the randomness is introduced (in particular the probability involved in obtaining a training dataset) in Section 3.1.

2.3 Previous Bounds

Before giving bounds on MI accuracy, we have to formally define the attack setting. Two of the main bounds (Yeom et al., 2018; Erlingsson et al., 2019) focused on an experimental setup first introduced by Yeom et al. (2018). In this setting, an adversary is given a datapoint that is likely to have been used to train a model or not. The adversary then either predicts if they think it was used, or otherwise. Let indicate if the datapoint was or was not used for training, respectively; we say the adversary was correct if their prediction matches . We then define the adversary’s advantage as improvement in accuracy over the baseline of random guessing, or more specifically where is the accuracy of .

For such an adversary operating in a setting where data is equally likely to be included or not in the training dataset, Yeom et al. (2018) showed that they could bound the advantage of the adversary by when training with -DP. In other words, they showed that they could bound the accuracy of the MI adversary by . Their proof used the fact that the true positive rate (TPR) and false positive rate (FPR) of their adversary could be represented as expectations over the different data points in a dataset, and from that introduced the DP condition to obtain their MI bound, noting that MI advantage is equivalent to TPR - FPR.

Erlingsson et al. (2019) improved on the bound developed by Yeom et al. (2018) for an adversary operating under the same condition by utilizing a proposition given by Hall et al. (2013) on the relation between TPR and FPR for an -DP function. Based on these insights, Erlingsson et al. (2019) bounded the membership advantage by , which is equivalent to bounding the accuracy of the adversary by when (i.e. in -DP). This is, to the best of our knowledge, the previous state-of-the-art bound for high .

Other works, similar to our setup in Section 3.1, considered a more general setting where the probability of sampling a datapoint in the dataset can vary. For -DP, Sablayrolles et al. (2019) bounded the probability of a datapoint being used in the training set of a model (i.e. , the accuracy of an attacker who predicted the datapoint was in the dataset of the model) by where is the probability of the datapoint being in the dataset. Do note, however, that this bound was derived under assumptions on the conditional weight distribution where are the data points used to train, is a temperature parameter, and

is a loss function. Assumptions put aside, this is, to the best of our knowledge, the previous state-of-the-art bound for low

(when reduced to the case of Yeom et al. (2018) by setting as to compare with Erlingsson et al. (2019)).

Finally, Jayaraman et al. (2020) bounded the positive predictive value of an attacker (referred to as ‘positive accuracy’ in our work) on a model trained with -DP when the FPR is fixed. Similarly, Jayaraman et al. (2020) further bounded membership advantage under the experiment described by Yeom et al. (2018) for a fixed FPR. Note, that both our work and the previously mentioned bounds are independent of FPR. In particular Erlingsson et al. (2019) followed a similar technique to Jayaraman et al. (2020), but were able to drop the FPR term using a proposition relating TPR to FPR (Hall et al., 2013).

3 The Bound

In the following section, we first formalize our adversary setting, and then present our MI bounds.

3.1 The Setting

Our setting is more general than the one introduced by Yeom et al. (2018) and, as we note later, a specific instance of it can be reduced to their setting. In particular, we formalize how an entity samples data into the training dataset, and proceed with our analysis from there.

We base our formalization on the intuition that one can imagine the existence of some finite data superset containing all the data points that an entity could have in their training dataset. Yet, any one of these datapoints only has some probability of being sampled into the training dataset. For example, this larger dataset could consist of all the users that gave an entity access to their data, and the probability comes from the entity randomly sampling the data to use in their training dataset. This randomness can be a black-box such that not even the entity knows what data was used to train. In essence, this is the setting Jayaraman et al. (2020) considers, though in their case, the larger dataset is implicit and takes the form of an arbitrary distribution. We can then imagine that the adversary (or perhaps an arbitrator in an unlearning setup) knows the larger dataset and tries to infer whether a particular data point was used in the training dataset. The particular general MI attack we analyze and bound is based on this setting.

Specifically, let the individual training datasets be constructed by sampling from a finite countable set where all datapoints are unique and sampled independently, i.e. from some larger set . That is if then the probability of sampling is , where is probability of drawing into the dataset and is the probability of not.

We define as the set of all datasets. Let now be the set of all datasets that contain a particular point , that is . Similarly let be the set of all datasets that do not contain , i.e. . Note by the simple logic that any dataset has or does not have in it. We then have the following lemma (see Appendix  A for the proof).

Lemma 1.

and are in bijective correspondence with for and that map to each other under the bijective correspondence.

Once some dataset is obtained, we call the training function which takes in and outputs a model as a set of weights in the form of a real vector. Recall that is -DP if for all adjacent datasets and and any set of model(s) in the output space of (i.e. some weights) we have: . It should be noted that from now on we assume that the set has a non-zero probability to be produced by . This is sensible as we are not interested in membership inference attacks on sets of models that have probability to come from training; note also if , then for all adjacent as , and thus the probability is for all countable datasets as we can construct any dataset by removing and adding a data point (which does not change the probability if it is initially ) countably many times.

3.2 Our Bounds

We now proceed to use Lemma 1 to bound the positive and negative accuracy of MI, as stated in Definition 1, for a training function that is -DP under the data-sampling setting defined earlier. Our approach differs from those we discussed in Section 2.3 in that we now focus on the definition of the conditional probability as a quotient; finding a bound then reduces to finding a way to simplify the quotient with the -DP definition, which we achieve using Lemma 1.

What follows are the bounds, with Section 4,5, and  6 expanding on the consequences of the bounds.

Theorem 1 (DP bounds MI positive accuracy).

If is a MI attack applied to a set of models and it predicts if was in the datasets used to obtain them, and the training process is DP with , its accuracy is upper-bounded by and lower bounded by , where is the probability of drawing into the dataset.

See Appendix A for the complete proof. The high level idea is to rewrite positive accuracy as a fraction, and use Lemma 1 (applicable since we only deal with countable sets) to simplify terms alongside the -DP condition. The -DP gives inequalities in both directions, providing us with the lower- and upper-bounds.

By the definition of negative accuracy of , we have the following corollary:

Corollary 1 (DP bounds MI negative accuracy).

If is an MI attack applied to a set of models and it predicts if is not in it, and the training process is DP with , then the accuracy is upper-bounded by and lower-bounded by , where is the probability of drawing into the dataset.

The full proof is also provided in Appendix A.

Note that in the case , the bounds given by Theorem 1 and Corollary 1 are identical. Therefore, as must output either or , we have a more general claim that the attack accuracy (maximum of positive or negative accuracy) is always bounded by the same values given by Theorem 1. Furthermore, do note that the lower and upper bounds given by Theorem 1 converge for low and high and for low . This is to say, our bounds are tight in those regions, and so future improvement would be most significant for intermediate values of and larger .

4 The Effect of

We now focus on the effect of . Section 4.1 explains how the privacy amplification, i.e.  lowering of our MI positive accuracy bound, we observe from decreasing is fundamentally different than the privacy amplification on MI from batch sampling. Section 4.2 outlines the practical consequences of this for a defender.

4.1 A New Privacy Amplification for MI

Our bound given by Theorem 1 can be reduced by decreasing or . Furthermore, we have batch sampling, which is the probability for a data point to be used in the batch for a given training step, reduces MI positive accuracy as it reduces . So dataset sub-sampling () and batch sampling both decrease our bound, and we term their effect ”privacy amplification” (for MI) as they decrease privacy infringement (analogous to how ”privacy amplification” for DP refers to methods that reduce privacy loss). We now ask the question, is the effect of dataset sub-sampling and batch sampling different?

Before proceeding, it is useful to get a sense of the impact has on our bound. We plot against our positive MI accuracy bound given by Theorem 1 in Figure 3 for different (see Appendix B). Notably, for a specific case when is small, we get that the positive accuracy is bounded by for and (i.e. by sampling with low enough probability, we certify that they will be correct at most of the time).

We now turn to comparing the effect of batch sampling to the effect of (note ). First it is worth noting that the two amplification methods are mostly independent, i.e. decreasing mostly places no restriction on the sampling rate for batch sizes (with some exception).222We say ”mostly” as this is true upto a point. In particular the expectation of the training dataset size decreases with smaller dataset sampling probabilities, and thus the lowest possible batch sampling rate increases in expectation. Nevertheless we can ignore this restriction for the time being as we are interested in their independent mathematical behaviours. Including for DP batch privacy amplification, we can compare its impact to by looking at the term in the bound given by Theorem 1; the goal is to maximize this to make the upper bound as small as possible. In particular, we see that decreasing increases this term by where as decreasing increases this term by , which is slower than up to a point, then faster (do note that we are looking at the order as the variable decreases). Figure 1 plots this relation, however note that the specific values are subject to change with differing constant. Nevertheless what does not change with the constants are the asymptotic behaviours, and in particular we see where as .

Thus, we can conclude the effects of data sampling and batch sampling are different to our bound. Therefore, data sampling presents a complementary privacy amplification scheme for MI positive accuracy. As a last remark, we note that the same comparison holds more generally when comparing the effect of and .

Figure 1: Comparing the DP amplification observed by decreasing batch probability (given by ) to the amplification we observe from decreasing (given by ).

4.2 Usefulness for a Defender

We now explain one course of action a defender can take in light of this new privacy amplification for MI. In particular note that an upper bound on translates to an upper bound on the relation found in Theorem 1 (as the bound is monotonically increasing with ); hence one can, in practice, focus on giving smaller upper-bounds on to decrease MI positive accuracy.

A possible approach to this can be described as follows: Say a user (the defender) is given some sample drawn with some unknown distribution. In particular they do not know the individual probabilites for the points being in . However, say that from they obtain by sampling any point from with probability . Then, the probability for any point being in is bounded by as the true probability is the probability (which is ) times . Hence if the user trains with they will have and thus can use our bound to give guarantees on how low the MI positive accuracy is.

This does come with some drawbacks. In general one wants to train with more data, but by further sampling with probability we reduce our expected training dataset size. As a consequence, a user will have to make the decision between how low they can make (in conjunction with the parameter they choose) compared to how small a dataset they are willing to train on. We leave this type of decision making for future work.

5 Importance to Data Deletion

The ability to decrease MI accuracy, i.e. the ability for an arbitrator to attribute a data point to a model, has consequences for machine unlearning and data deletion.

Background on unlearning.

Having bounds on MI is particularly relevant to machine unlearning for DNNs. Machine unlearning was first introduced by Cao and Yang (2015), who described a setting where it is important for the model to be able to ”forget” certain training data points. The authors focused on the cases where there exist efficient analytic solutions to this problem. The topic of machine unlearning was then extended to DNNs by Bourtoule et al. (2019) with the definition that a model has unlearned a data point if after the unlearning, the distribution of models returned is identical to the one that would result from not training with the data point at all. This definition was also stated earlier by Ginart et al. (2019)

for other classes of machine learning models.

Given that unlearning is interested in removing the impact a data point had on the model, further work employed MI accuracy on the data point to be unlearned as a metric for how well the model had unlearned it after using some proposed unlearning method (Baumhauer et al., 2020; Graves et al., 2020; Golatkar et al., 2020b, a)

. Yet, empirical estimates on the membership status of a datapoint are subjective to the concrete MI attacks employed. Indeed it may be possible that there exists a stronger practical attack.

Applying our result to unlearning.

Analytic bounds to MI attacks, on the other hand, resolve the subjectivity issue of MI as a metric for unlearning as they bound the success of any adversary. In particular one could give the following definition of an unlearning guarantee from a formal MI positive accuracy bound:

Definition 2 (-MI Unlearning Guarantee).

An algorithm is -MI unlearnt for if , i.e. the probability of not being in the training dataset is greater than .

Therefore, our result bounding positive MI accuracy has direct consequences on the field of machine unlearning.

In particular if is sufficiently low, that is the likelihood of coming from is low, then an entity could claim that they do not need to delete the users data since their model is most likely independent of that data point as it most likely came from a model without it: i.e. leading to plausible deniability. Note we defined this type of unlearning as a -MI unlearning guarantee. This is similar to the logic presented by Sekhari et al. (2021) where unlearning is presented probablistically in terms of -unlearning.

We also observe an analogous result to Sekhari et al. (2021) where we can only handle a maximum number of deletion requests before no longer having sufficiently low probability. To be exact, let us say we do not need to undergo any unlearning process given a set of data deletion request if for some (i.e. we are working with probability of not having that set of data in our training set which we want to be high). Note that we sampled data independently, thus if , then .

Now, for simplicity, assume the probability of drawing all points into the datasets are the same, so that for all we have the same bound given by Corollary 1, that is for some . Then we have and so an entity does not need to unlearn if , i.e. if . This gives a bound on how many deletion requests the entity can avoid in terms of the lower bound given in Corollary 1. In particular, note that if is the larger set of data points an entity is sampling from, and , then the lower bound given by Corollary 1 is .

Corollary 2 (-MI Unlearning Capacity).

If is the larger set of data points an entity is sampling from, and , then one can delete with and satisfy -MI unlearning.

Sekhari et al. (2021) showed that with typical DP the deletion requests grow linearly with the size of the training (in the above case represents the expected training set size). We thus compare a linear line w.r.t to to (given by Corollary 2) in Figure 7 (see Appendix B) to observe their respective magnitude: we fix , and as we are interested in general trends. We observe that our deletion capacity is significantly higher for low expected training dataset sizes and is marginally lower than a linear trend for larger training set sizes.

6 Discussion

In the following we discuss: how our bound is tighter than previous results, how it also informs on the generalization gap when training with -DP, how

-DP does not give similar bounds (by studying logistic regression), and the connection to MI advantage.

6.1 Our Bound is Tighter than Earlier Results

We first compare our main technical result, the bound on a MI adversary’s success, with the two key baselines we identified earlier (Sablayrolles et al., 2019; Erlingsson et al., 2019), noting that Sablayrolles et al. (2019) made assumptions on the conditional weight distribution which we and Erlingsson et al. (2019) did not. The setting described in Section 3.1 is equivalent to the MI experiment defined by Yeom et al. (2018) when . That is, if the training dataset was constructed by sampling data points from a larger dataset by a coin flip, then when the adversary is given any data point from the larger dataset to test there is a chance it was in the training set or not. Furthermore, recall that when , then Theorem 1 reduces to bounds on the overall accuracy of any MI attack (as the bounds on positive and negative accuracy are the same). We can thus compare the upper-bound on MI accuracy that we achieved with the current tightest bounds given by Erlingsson et al. (2019) and Sablayrolles et al. (2019) for -DP; we stated these bounds earlier in Section 2.3, where for the latter bound we also set .

Our bound, and these two previous bounds, are depicted in Figure 2, where we see our bound given in Theorem 1 is always tighter than both of the previous bounds. In particular, we see that it is closer to the one introduced by Sablayrolles et al. (2019) for small and closer to the one defined by Erlingsson et al. (2019) for large . Notably, for , we bound the accuracy of an MI attack by whereas Erlingsson et al. (2019) bound it by , and Sablayrolles et al. (2019) by . For , we bound MI accuracy by whereas Erlingsson et al. (2019) bound it by and Sablayrolles et al. (2019) by . Do note our bound generalizes for when the sampling probability is not ; we compare our bound with Sablayrolles et al. (2019) in this setting in Figure 5, though recall once again Sablayrolles et al. (2019) made assumptions on the weight distribution we did not.

6.2 We can Bound Generalization Error

Yeom et al. (2018) studied adversaries that took advantage of a generalization gap between training datapoints and non-training datapoints. In particular for Adversary 1 in Yeom et al. (2018), they showed its MI accuracy is (Theorem 2 in Yeom et al. (2018)). That is, we have an adversary whose accuracy is completely determined by the generalization gap of the model. However, when training with -DP the accuracy of all MI adversaries is bounded by our lower and upper-bound in Theorem 1 (taking as we are looking at accuracy). Thus we have the following corollary.

Corollary 3 (MI bounds Generalization Error).

For loss function upper-bounded by and -DP training algorithm, we have where is the generalization gap (Definition 3 in Yeom et al. (2018), where their is our larger set, and their is the specific training set sampled from the larger set).

In particular this tells how well the sampled dataset generalizes to having trained on the entire larger set. We believe this general procedure of using specific adversaries with accuracies dependent on values of interest (e.g. the generalization gap here) and applying more general bounds to get a range for those values of interest is something future work can expand on.

6.3 Limitations of -Dp

We now motivate future work on MI bounds by illustrating how our approach of bounding the positive accuracy of a MI adversary for -DP does not immediately extend to -DP. Such an extension would be desirable so one can capture the success of MI adversaries against training algorithms that only provide relaxed guarantees of DP, as opposed to the -DP guarantees we studied.

To do so, we give a practical counter-example that shows for a specific MI attack on an -DP logistic regression that there is no bound on the positive accuracy of the adversary, unlike what follows from our bound for -DP. Nevertheless, we demonstrate how one can still tailor bounds on general MI accuracy for this specific attack and remark how this bound is much tighter than what our theorem states for any MI adversary in the tighter -DP conditions.

6.3.1 Positive Accuracy is not bounded

Consider a set of two (scalar) points which are drawn into the training set with and ; that is is always in the training set, and has a chance of being in the training set. Let model be a single dimensional logistic regression without bias defined as initialized such that for cross-entropy loss , (i.e. set and the weights so that and thus the softmax output of the model is approximately on and thus gradient is approximately ). Conversely set such that the gradient on it is less than (i.e. for the above setting set ).

Now, train the model to -DP following Abadi et al. (2016) with , sampling rate of , a maximum gradient norm of , for one step. Note these are just parameters which we are allowed to change under the DP analysis, and we found the noise we would need for -DP is . Then consider running a MI attack on the above setup where if for some threshold the final weights are s.t if one classifies those weights as having come from the dataset with , otherwise not. Do note that here we use for the final weights as opposed to

to emphasize that we are now talking about a random variable. The intuition for this attack is that if the dataset only contain

then the weights do not change, but if the dataset contains we know the resulting gradient is negative (by construction) and thus decreases the weights (before noise).

By the earlier setting on how training datasets are constructed note that or , and we will denote these and respectively. Note that if trained on following the suggested data points and initial weights, we have the distribution of final weight where denotes the noise needed for -DP as stated earlier. Similarly , since the maximum gradient norm is set to 1.

For the above MI attack we can then bound the positive accuracy as a function of by:


where is the (Gaussian) cumulative function of random variable upto .

We plot this in Figure 3(a), and unlike Theorem 1, note how it is not bounded by anything less than and goes to as the threshold decreases (i.e.  s.t yields positive accuracy greater than ).

6.3.2 MI accuracy is bounded

The previous section showed that for -DP the positive accuracy of our adversary is not bounded. However, as we will show in this section, this does not mean the overall accuracy is not bounded. Specifically, for the same attack and setup as in Section 6.3.1, note that we have a bound on the general accuracy of this specific attack given by:


We illustrate this in Figure 3(b) and observe that it is bounded by for . Do note that is significantly less than which is what our bound gives for -DP, and -DP is a tighter DP condition than -DP depicted in Figure 3(b). This illustrates how the bound can be further tightened with a better understanding of the (worst case scenario) weight distribution and the nature of the attack. We leave this to future work.

6.4 MI Accuracy vs. MI Advantage

Previous work, particularly Yeom et al. (2018) and Erlingsson et al. (2019), focused on membership advantage, which is essentially an improvement in accuracy of over the random guess of of drawing a point in the training dataset. More specifically, if we let denote the accuracy of , then membership advantage is computed as . We can generalize this to ask what the membership advantage of the positive accuracy of compared to the baseline is for values other than only .

Theorem 1 gives us an upper bound on the positive accuracy and thus an upper bound on the positive advantage of denoted as :


We plotted this advantage as a function of for different fixed in Figure 6 (see Appendix B). We observe that the advantage clearly depends on , and in fact for different , the resulting in the maximum advantage changes. In particular, is not close to the maximum advantage for large , which shows how the fixed experiment proposed by Yeom et al. (2018) does not necessarily give the maximum advantage an adversary could have, whereas our result allows us to.

However, it should be noted that higher advantage here does not mean a higher upper bound on MI accuracy; as we already saw in Figure 3, the upper bound on accuracy increases monotonically with , in contrast to the bump observed with membership advantage. This serves to differentiate the study of MI advantage and the study of MI accuracy for future work.

7 Conclusion

In this work, we provide a tighter bound on MI accuracy against ML models trained with -DP. Our bound highlights that intricacies of dataset construction are of great importance for model vulnerability to MI attacks. Indeed, based on our findings, we develop a privacy amplification scheme that just requires one to sub-sample their training dataset from a larger pool of possible data points.

Based on our results, entities training their ML models with DP can employ looser privacy guarantees (and thereby preserve their models’ accuracy better) while still limiting the success of MI attacks. Finally, our bound, and more generally bounds on positive MI accuracy, can also be applied to characterize a model’s ability to process unlearning requests when defining machine unlearning as achieving a model with low probability of having been derived from a particular data point to be unlearned.


We would like to acknowledge our sponsors, who support our research with financial and in-kind contributions: CIFAR through the Canada CIFAR AI Chair, DARPA through the GARD project, Intel, Meta, NFRF through an Exploration grant, and NSERC through the COHESA Strategic Alliance. Resources used in preparing this research were provided, in part, by the Province of Ontario, the Government of Canada through CIFAR, and companies sponsoring the Vector Institute. We would like to thank members of the CleverHans Lab for their feedback.


  • M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang (2016) Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pp. 308–318. Cited by: §1, §1, §2.1, §6.3.1.
  • E. Bagdasaryan, O. Poursaeed, and V. Shmatikov (2019) Differential privacy has disparate impact on model accuracy. Advances in Neural Information Processing Systems 32, pp. 15479–15488. Cited by: §2.2.
  • B. Balle and Y. Wang (2018) Improving the gaussian mechanism for differential privacy: analytical calibration and optimal denoising. In International Conference on Machine Learning, pp. 394–403. Cited by: §2.1.
  • R. Bassily, A. Smith, and A. Thakurta (2014) Private empirical risk minimization: efficient algorithms and tight error bounds. In 2014 IEEE 55th Annual Symposium on Foundations of Computer Science, pp. 464–473. Cited by: §2.1.
  • T. Baumhauer, P. Schöttle, and M. Zeppelzauer (2020)

    Machine unlearning: linear filtration for logit-based classifiers

    arXiv preprint arXiv:2002.02730. Cited by: §1, §5.
  • L. Bourtoule, V. Chandrasekaran, C. A. Choquette-Choo, H. Jia, A. Travers, B. Zhang, D. Lie, and N. Papernot (2019) Machine unlearning. arXiv preprint arXiv:1912.03817. Cited by: §5.
  • Y. Cao and J. Yang (2015) Towards making systems forget with machine unlearning. In 2015 IEEE Symposium on Security and Privacy, pp. 463–480. Cited by: §1, §5.
  • C. A. Choquette-Choo, F. Tramer, N. Carlini, and N. Papernot (2021) Label-only membership inference attacks. In International Conference on Machine Learning, pp. 1964–1974. Cited by: §1, §2.2.
  • C. Dwork, F. McSherry, K. Nissim, and A. Smith (2006) Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference, pp. 265–284. Cited by: §1, §2.1.
  • C. Dwork, G. N. Rothblum, and S. Vadhan (2010) Boosting and differential privacy. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pp. 51–60. Cited by: §2.1.
  • C. Dwork (2008) Differential privacy: a survey of results. In International conference on theory and applications of models of computation, pp. 1–19. Cited by: §1.
  • Ú. Erlingsson, I. Mironov, A. Raghunathan, and S. Song (2019) That which we call private. arXiv preprint arXiv:1908.03566. Cited by: Figure 2, Table 1, §1, §2.3, §2.3, §2.3, §2.3, §6.1, §6.1, §6.4.
  • A. Ginart, M. Y. Guan, G. Valiant, and J. Zou (2019) Making ai forget you: data deletion in machine learning. arXiv preprint arXiv:1907.05012. Cited by: §5.
  • A. Golatkar, A. Achille, and S. Soatto (2020a) Eternal sunshine of the spotless net: selective forgetting in deep networks. In

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    pp. 9304–9312. Cited by: §1, §5.
  • A. Golatkar, A. Achille, and S. Soatto (2020b) Forgetting outside the box: scrubbing deep networks of information accessible from input-output observations. In European Conference on Computer Vision, pp. 383–398. Cited by: §1, §5.
  • L. Graves, V. Nagisetty, and V. Ganesh (2020) Amnesiac machine learning. arXiv preprint arXiv:2010.10981. Cited by: §1, §5.
  • R. Hall, A. Rinaldo, and L. Wasserman (2013) Differential privacy for functions and functional data. The Journal of Machine Learning Research 14 (1), pp. 703–727. Cited by: §2.3, §2.3.
  • H. Hu, Z. Salcic, G. Dobbie, and X. Zhang (2021) Membership inference attacks on machine learning: a survey. arXiv preprint arXiv:2103.07853. Cited by: §1.
  • M. Jagielski, J. Ullman, and A. Oprea (2020) Auditing differentially private machine learning: how private is private sgd?. arXiv preprint arXiv:2006.07709. Cited by: §2.1, §2.1.
  • B. Jayaraman, L. Wang, K. Knipmeyer, Q. Gu, and D. Evans (2020) Revisiting membership inference under realistic assumptions. arXiv preprint arXiv:2005.10881. Cited by: §1, §2.2, §2.3, §3.1.
  • P. Kairouz, S. Oh, and P. Viswanath (2015) The composition theorem for differential privacy. In International conference on machine learning, pp. 1376–1385. Cited by: §2.1.
  • P. Maini, M. Yaghini, and N. Papernot (2021) Dataset inference: ownership resolution in machine learning. arXiv preprint arXiv:2104.10706. Cited by: §1, §2.2.
  • M. Nasr, S. Song, A. Thakurta, N. Papernot, and N. Carlini (2021) Adversary instantiation: lower bounds for differentially private machine learning. arXiv preprint arXiv:2101.04535. Cited by: §2.1, §2.2, footnote 1.
  • A. Sablayrolles, M. Douze, C. Schmid, Y. Ollivier, and H. Jégou (2019) White-box vs black-box: bayes optimal strategies for membership inference. In International Conference on Machine Learning, pp. 5558–5567. Cited by: Figure 2, Figure 5, Table 1, §1, §2.2, §2.3, §6.1, §6.1.
  • A. Salem, Y. Zhang, M. Humbert, P. Berrang, M. Fritz, and M. Backes (2018) Ml-leaks: model and data independent membership inference attacks and defenses on machine learning models. arXiv preprint arXiv:1806.01246. Cited by: §2.2.
  • A. Sekhari, J. Acharya, G. Kamath, and A. T. Suresh (2021) Remember what you want to forget: algorithms for machine unlearning. arXiv preprint arXiv:2103.03279. Cited by: Figure 7, §5, §5, §5.
  • R. Shokri, M. Stronati, C. Song, and V. Shmatikov (2017) Membership inference attacks against machine learning models. In 2017 IEEE Symposium on Security and Privacy (SP), pp. 3–18. Cited by: §1, §2.2.
  • S. Song, K. Chaudhuri, and A. D. Sarwate (2013) Stochastic gradient descent with differentially private updates. In 2013 IEEE Global Conference on Signal and Information Processing, pp. 245–248. Cited by: §2.1.
  • S. Truex, L. Liu, M. E. Gursoy, L. Yu, and W. Wei (2019) Demystifying membership inference attacks in machine learning as a service. IEEE Transactions on Services Computing. Cited by: §2.2.
  • S. Yeom, I. Giacomelli, M. Fredrikson, and S. Jha (2018) Privacy risk in machine learning: analyzing the connection to overfitting. In 2018 IEEE 31st Computer Security Foundations Symposium (CSF), pp. 268–282. Cited by: Table 1, §1, §2.2, §2.3, §2.3, §2.3, §2.3, §2.3, §3.1, §6.1, §6.2, §6.4, §6.4, Corollary 3.

Appendix A Proofs

Lemma 1 Note that for a given , is unique (i.e. the map by removing is injective) and similarly for a given is unique (i.e. the map by adding is injective). Thus, we have injective maps running both ways which are the inverses of each other. As a consequence, we have and are in bijective correspondence.

Now if the larger set of datapoints is letting and be any pair of datasets that map to each other by the above bijective map, then note and . In particular we have .

Theorem 1 The positive accuracy of is:


By the observation we have that the denominator can be split into .

By Lemma 1, we can replace the in the second sum by and replace by . For note by being -DP we have and so with the previous replacements we have that the denominator is greater than .

Thus, the accuracy of is (i.e. the upper bound).

If instead we used the fact that , we would find that the accuracy of is (i.e. the lower bound).

Corollary 1 Immediately follows from Theorem 1 and Defintion 1, as if then .

Similarly, we get

Appendix B Figures

Figure 2: Comparing the upper bound to MI performance we achieved to that given by Erlingsson et al. [2019] and Sablayrolles et al. [2019] (note here). In particular note we are tighter for all .
Figure 3: Our upper bound on MI positive accuracy as a function of
(a) Positive accuracy as a function of the threshold
(b) Accuracy as a function of the threshold
Figure 4: Impact of threshold on positive accuracy and accuracy.
Figure 5: Sablayrolles et al. [2019] upper bound on MI positive accuracy as a function of compared to our bound. Note that we are still tighter for all probabilities.
Figure 6: MI advantage
Figure 7: Comparing our deletion capacity trend to the trend Sekhari et al. [2021] describes. In particular our number of deletions degrades with training size while theirs increasing.

Appendix C Tables

Paper Analytic Form Type
Yeom et al. [2018] General
Erlingsson et al. [2019] General
Sablayrolles et al. [2019] Positive Accuracy
Our Work Positive Accuracy
Table 1: Bounds found in prior work.