The Unintended Consequences of Overfitting: Training Data Inference Attacks

09/05/2017 ∙ by Samuel Yeom, et al. ∙ University of Wisconsin-Madison Carnegie Mellon University 0

Machine learning algorithms that are applied to sensitive data pose a distinct threat to privacy. A growing body of prior work demonstrates that models produced by these algorithms may leak specific private information in the training data to an attacker, either through their structure or their observable behavior. However, the underlying cause of this privacy risk is not well understood beyond a handful of anecdotal accounts that suggest overfitting and influence might play a role. This paper examines the effect that overfitting and influence have on the ability of an attacker to learn information about training data from machine learning models, either through training set membership inference or model inversion attacks. Using both formal and empirical analyses, we illustrate a clear relationship between these factors and the privacy risk that arises in several popular machine learning algorithms. We find that overfitting is sufficient to allow an attacker to perform membership inference, and when certain conditions on the influence of certain features are present, model inversion attacks. Interestingly, our formal analysis also shows that overfitting is not necessary for these attacks, and begins to shed light on what other factors may be in play. Finally, we explore the connection between two types of attack, membership inference and model inversion, and show that there are deep connections between the two that lead to effective new attacks.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Machine learning has emerged as an important technology, enabling a wide range of applications including computer vision, machine translation, health analytics, and advertising, among others. The fact that many compelling applications of this technology involve the collection and processing of sensitive personal data has given rise to concerns about privacy 

[AtenieseFMSVV13, cormode-bayes, mi2015, fredrikson2014privacy, mitheory2016, Li2013, ShokriSS17, dptheory2016, brickell-utility]. In particular, when machine learning algorithms are applied to private training data, the resulting models might unwittingly leak information about that data through either their behavior (i.e., black-box attack) or the details of their structure (i.e., white-box attack).

Although there has been a significant amount of work aimed at developing machine learning algorithms that satisfy definitions such as differential privacy [dptheory2016, m-est, functional-mech, Guha13, Dwork2015-2, Dwork2015]

, the factors that bring about specific types of privacy risk in applications of standard machine learning algorithms are not well understood. Following the connection between differential privacy and stability from statistical learning theory 

[Guha13, Wang2016ERM, Dwork2015-2, Dwork2015, Bassily2014, Chaudhuri2011], one such factor that has started to emerge [ShokriSS17, fredrikson2014privacy] as a likely culprit is overfitting. A machine learning model is said to overfit to its training data when its performance on unseen test data diverges from the performance observed during training, i.e., its generalization error is large. The relationship between privacy risk and overfitting is further supported by recent results that suggest the contrapositive, i.e., under certain reasonable assumptions, differential privacy [Dwork2015-2] and related notions of privacy [Bassily2016, Wang2016KL] imply good generalization. However, a precise account of the connection between overfitting and the risk posed by different types of attack remains unknown.

A second factor identified as relevant to privacy risk is influence [mitheory2016], a quantity that arises often in the study of Boolean functions [OD14]. Influence measures the extent to which a particular input to a function is able to cause changes to its output. In the context of machine learning privacy, the influential features of a model may give an active attacker the ability to extract information by observing the changes they cause.

In this paper, we characterize the effect that overfitting and influence have on the advantage of adversaries who attempt to infer specific facts about the data used to train machine learning models. We formalize quantitative advantage measures that capture the privacy risk to training data posed by two types of attack, namely membership inference [Li2013, ShokriSS17] and attribute inference [fredrikson2014privacy, dptheory2016, mitheory2016, mi2015]. For each type of attack, we analyze the advantage in terms of generalization error (overfitting) and influence for several concrete black-box adversaries. While our analysis necessarily makes formal assumptions about the learning setting, we show that our analytic results hold on several real-world datasets by controlling for overfitting through regularization and model structure.

Membership inference

Training data membership inference attacks aim to determine whether a given data point was present in the training data used to build a model. Although this may not at first seem to pose a serious privacy risk, the threat is clear in settings such as health analytics where the distinction between case and control groups could reveal an individual’s sensitive conditions. This type of attack has been extensively studied in the adjacent area of genomics [homer08resolving, sankararaman2009genomic], and more recently in the context of machine learning [Li2013, ShokriSS17].

Our analysis shows a clear dependence of membership advantage on generalization error (Section 3.2), and in some cases the relationship is directly proportional (Theorem 2). Our experiments on real data confirm that this connection matters in practice (Section 6.2

), even for models that do not conform to the formal assumptions of our analysis. In one set of experiments, we apply a particularly straightforward attack to deep convolutional neural networks (CNNs) using several datasets examined in prior work on membership inference.

Despite requiring significantly less computation and adversarial background knowledge, our attack performs almost as well as a recently published attack [ShokriSS17].

Our results illustrate that overfitting is a sufficient condition for membership vulnerability in popular machine learning algorithms. However, it is not a necessary condition (Theorem 4). In fact, under certain assumptions that are commonly satisfied in practice, we show that a stable training algorithm (i.e., one that does not overfit) can be subverted so that the resulting model is nearly as stable but reveals exact membership information through its black-box behavior. This attack is suggestive of algorithm substitution attacks from cryptography [BPR14] and makes adversarial assumptions similar to those of other recent ML privacy attacks [SRS17]. We implement this construction to train deep CNNs (Section 6.4) and observe that, regardless of the model’s generalization behavior, the attacker can recover membership information while incurring very little penalty to predictive accuracy.

Attribute inference

In an attribute inference attack, the adversary uses a machine learning model and incomplete information about a data point to infer the missing information for that point. For example, in work by Fredrikson et al. [fredrikson2014privacy], the adversary is given partial information about an individual’s medical record and attempts to infer the individual’s genotype by using a model trained on similar medical records.

We formally characterize the advantage of an attribute inference adversary as its ability to infer a target feature given an incomplete point from the training data, relative to its ability to do so for points from the general population (Section 4). This approach is distinct from the way that attribute advantage has largely been characterized in prior work [fredrikson2014privacy, mi2015, mitheory2016], which prioritized empirically measuring advantage relative to a simulator who is not given access to the model. We offer an alternative definition of attribute advantage (Definition 6) that corresponds to this characterization and argue that it does not isolate the risk that the model poses specifically to individuals in the training data.

Our formal analysis shows that attribute inference, like membership inference, is indeed sensitive to overfitting. However, we find that influence must be factored in as well to understand when overfitting will lead to privacy risk (Section 4.1). Interestingly, the risk to individuals in the training data is greatest when these two factors are “in balance”. Regardless of how large the generalization error becomes, the attacker’s ability to learn more about the training data than the general population vanishes as influence increases.

Connection between membership and attribute inference

The two types of attack that we examine are deeply related. We build reductions between the two by assuming oracle access to either type of adversary. Then, we characterize each reduction’s advantage in terms of the oracle’s assumed advantage. Our results suggest that attribute inference may be “harder” than membership inference: attribute advantage implies membership advantage (Theorem 6), but there is currently no similar result in the opposite direction.

Our reductions are not merely of theoretical interest. Rather, they function as practical attacks as well. We implemented a reduction for attribute inference and evaluated it on real data (Section 6.3). Our results show that when generalization error is high, the reduction adversary can outperform an attribute inference attack given in [fredrikson2014privacy] by a significant margin.


This paper explores the relationships between privacy, overfitting, and influence in machine learning models. We present new formalizations of membership and attribute inference attacks that enable an analysis of the privacy risk that black-box variants of these attacks pose to individuals in the training data. We give analytic quantities for the attacker’s performance in terms of generalization error and influence, which allow us to conclude that certain configurations imply privacy risk. By introducing a new type of membership inference attack in which a stable training algorithm is replaced by a malicious variant, we find that the converse does not hold: machine learning models can pose immediate threats to privacy without overfitting. Finally, we study the underlying connections between membership and attribute inference attacks, finding surprising relationships that give insight into the relative difficulty of the attacks and lead to new attacks that work well on real data.

2 Background

Throughout the paper we focus on privacy risks related to machine learning algorithms. We begin by introducing basic notation and concepts from learning theory.

2.1 Notation and preliminaries

Let be a data point, where represents a set of features or attributes and a response. In a typical machine learning setting, and thus throughout this paper, it is assumed that the features are given as input to the model, and the response is returned. Let represent a distribution of data points, and let be an ordered list of points, which we will refer to as a dataset, training set, or training data interchangeably, sampled i.i.d. from . We will frequently make use of the following methods of sampling a data point :

  • : is picked uniformly at random from , and is set equal to the -th element of .

  • : is chosen according to the distribution .

When it is clear from the context, we will refer to these sampling methods as sampling from the dataset and sampling from the distribution, respectively.

Unless stated otherwise, our results pertain to the standard machine learning setting, wherein a model is obtained by applying a machine learning algorithm to a dataset . Models reside in the set

and are assumed to approximately minimize the expected value of a loss function

over . If , the loss function measures how much differs from . When the response domain is discrete, it is common to use the 0-1 loss function, which satisfies if and otherwise. When the response is continuous, we use the squared-error loss . Additionally, it is common for many types of models to assume that

is normally distributed in some way. For example, linear regression assumes that

is normally distributed given  [Murphy2012]. To analyze these cases, we use the error function , which is defined in Equation 1.


Intuitively, if a random variable

is normally distributed and , then

represents the probability that

is within standard deviations of the mean.

2.2 Stability and generalization

An algorithm is stable if a small change to its input causes limited change in its output. In the context of machine learning, the algorithm in question is typically a training algorithm , and the “small change” corresponds to the replacement of a single data point in . This is made precise in Definition 1.

Definition 1 (On-Average-Replace-One (ARO) Stability).

Given and an additional point , define . Let be a monotonically decreasing function. Then a training algorithm is on-average-replace-one-stable (or ARO-stable) on loss function with rate if

where in the expectation refers to the randomness used by the training algorithm.

Stability is closely related to the popular notion of differential privacy [dwork06] given in Definition 2.

Definition 2 (Differential privacy).

An algorithm satisfies -differential privacy if for all that differ in the value at a single index and all , the following holds:

When a learning algorithm is not stable, the models that it produces might overfit to the training data. Overfitting is characterized by large generalization error, which is defined below.

Definition 3 (Average generalization error).

The average generalization error of a machine learning algorithm on is defined as

In other words, overfits if its expected loss on samples drawn from is much greater than its expected loss on its training set. For brevity, when , , and are unambiguous from the context, we will write instead.

It is important to note that Definition 3 describes the average generalization error over all training sets, as contrasted with another common definition of generalization error , which holds the training set fixed. The connection between average generalization and stability is formalized by Shalev-Shwartz et al. [ShalevShwartz10], who show that an algorithm’s ability to achieve a given generalization error (as a function of ) is equivalent to its ARO-stability rate.

3 Membership Inference Attacks

In a membership inference attack, the adversary attempts to infer whether a specific point was included in the dataset used to train a given model. The adversary is given a data point , access to a model , the size of the model’s training set , and the distribution that the training set was drawn from. With this information the adversary must decide whether . For the purposes of this discussion, we do not distinguish whether the adversary ’s access to is “black-box”, i.e., consisting only of input/output queries, or “white-box”, i.e., involving the internal structure of the model itself. However, all of the attacks presented in this section assume black-box access.

Experiment 1 below formalizes membership inference attacks. The experiment first samples a fresh dataset from and then flips a coin to decide whether to draw the adversary’s challenge point from the training set or the original distribution. is then given the challenge, along with the additional information described above, and must guess the value of .

Experiment 1 (Membership experiment ).

Let be an adversary, be a learning algorithm, be a positive integer, and be a distribution over data points . The membership experiment proceeds as follows:

  1. Sample , and let .

  2. Choose uniformly at random.

  3. Draw if , or if

  4. is 1 if and 0 otherwise. must output either 0 or 1.

Definition 4 (Membership advantage).

The membership advantage of is defined as

where the probabilities are taken over the coin flips of , the random choices of and , and the random data point or .

Equivalently, the right-hand side can be expressed as the difference between ’s true and false positive rates


where is a shortcut for .

Using Experiment 1, Definition 4 gives an advantage measure that characterizes how well an adversary can distinguish between and after being given the model. This is slightly different from the sort of membership inference described in some prior work [ShokriSS17, Li2013], which distinguishes between and . We are interested in measuring the degree to which reveals membership to , and not in the degree to which any background knowledge of or does. If we sample from instead of , the adversary could gain advantage by noting which data points are more likely to have been sampled into . This does not reflect how leaky the model is, and Definition 4 rules it out.

In fact, the only way to gain advantage is through access to the model. In the membership experiment , the adversary must determine the value of by using , , , and . Of these inputs, and do not depend on , and we have the following for all :

We note that Definition 4 does not give the adversary credit for predicting that a point drawn from (i.e., when ), which also happens to be in , is a member of . As a result, the maximum advantage that an adversary can hope to achieve is , where is the probability of re-sampling an individual from the training set into the general population. In real settings is likely to be exceedingly small, so this is not an issue in practice.

3.1 Bounds from differential privacy

Our first result (Theorem 1) bounds the advantage of an adversary who attempts a membership attack on a differentially private model [dwork06]. Differential privacy imposes strict limits on the degree to which any point in the training data can affect the outcome of a computation, and it is commonly understood that differential privacy will limit membership inference attacks. Thus it is not surprising that the advantage is limited by a function of .

Theorem 1.

Let be an -differentially private learning algorithm and be a membership adversary. Then we have:


Given and an additional point , define . Then, and have identical distributions for all , so we can write:

The above two equalities, combined with Equation 2, gives:


Without loss of generality for the case where models reside in an infinite domain, assume that the models produced by come from the set . Differential privacy guarantees that for all ,

Using this inequality, we can rewrite and bound the right-hand side of Equation 3 as

which is at most since for any , , , and . ∎

Wu et al. [dptheory2016, Section 3.2] present an algorithm that is differentially private as long as the loss function is -strongly convex and -Lipschitz. Moreover, they prove that the performance of the resulting model is close to the optimal. Combined with Theorem 1, this provides us with a bound on membership advantage when the loss function is strongly convex and Lipschitz.

3.2 Membership attacks and generalization

In this section, we consider several membership attacks that make few, common assumptions about the model or the distribution . Importantly, these assumptions are consistent with many natural learning techniques widely used in practice.

For each attack, we express the advantage of the attacker as a function of the extent of the overfitting, thereby showing that the generalization behavior of the model is a strong predictor for vulnerability to membership inference attacks. In Section 6.2, we demonstrate that these relationships often hold in practice on real data, even when the assumptions used in our analysis do not hold.

Bounded loss function

We begin with a straightforward attack that makes only one simple assumption: the loss function is bounded by some constant . Then, with probability proportional to the model’s loss at the query point , the adversary predicts that is not in the training set. The attack is formalized in Adversary 1.

Adversary 1 (Bounded loss function).

Suppose for some constant , all , and all sampled from or . Then, on input , , , and , the membership adversary proceeds as follows:

  1. Query the model to get .

  2. Output 1 with probability . Else, output 0.

Theorem 2 states that the membership advantage of this approach is proportional to the generalization error of , showing that advantage and generalization error are closely related in many common learning settings. In particular, classification settings, where the 0-1 loss function is commonly used, yields membership advantage equal to the generalization error. Simply put, high generalization error necessarily results in privacy loss for classification models.

Theorem 2.

The advantage of Adversary 1 is .


The proof is as follows:

Gaussian error

Whenever the adversary knows the exact error distribution, it can simply compute which value of is more likely given the error of the model on . This adversary is described formally in Adversary 2

. While it may seem far-fetched to assume that the adversary knows the exact error distribution, linear regression models implicitly assume that the error of the model is normally distributed. In addition, the standard errors

, of the model on and , respectively, are often published with the model, giving the adversary full knowledge of the error distribution. We will describe in Section 3.3 how the adversary can proceed if it does not know one or both of these values.

Adversary 2 (Threshold).

Suppose and

, the conditional probability density functions of the error, are known in advance. Then, on input

, , , and , the membership adversary proceeds as follows:

  1. Query the model to get .

  2. Let . Output .

In regression problems that use squared-error loss, the magnitude of the generalization error depends on the scale of the response . For this reason, in the following we use the ratio to measure generalization error. Theorem 3 characterizes the advantage of this adversary in the case of Gaussian error in terms of . As one might expect, this advantage is 0 when and approaches 1 as . The dotted line in Figure 1(a) shows the graph of the advantage as a function of .

Theorem 3.

Suppose and are known in advance such that when and when . Then, the advantage of Membership Adversary 2 is


We have

Let be the points at which these two probability density functions are equal. Some algebraic manipulation shows that


Moreover, if , if and only if . Therefore, the membership advantage is

3.3 Unknown standard error

In practice, models are often published with just one value of standard error, so the adversary often does not know how compares to . One solution to this issue is to assume that , i.e., that the model does not terribly overfit. Then, the threshold is set at , which is the limit of the right-hand side of Equation 4 as approaches . Then, the membership advantage is . This expression is graphed in Figure 1(b) as a function of .

Alternatively, if the adversary knows which machine learning algorithm was used, it can repeatedly sample , train the model using the sampled , and measure the error of the model to arrive at reasonably close approximations of and .

3.4 Other sources of membership advantage

The results in the preceding sections show that overfitting is sufficient for membership advantage. However, models can leak information about the training set in other ways, and thus overfitting is not necessary for membership advantage. For example, the learning rule can produce models that simply output a lossless encoding of the training dataset. This example may seem unconvincing for several reasons: the leakage is obvious, and the “encoded” dataset may not function well as a model. In the rest of this section, we present a pair of colluding training algorithm and adversary that does not have the above issues but still allows the attacker to learn the training set almost perfectly. This is in the framework of an algorithm substitution attack (ASA) [BPR14], where the target algorithm, which is implemented by closed-source software, is subverted to allow a colluding adversary to violate the privacy of the users of the algorithm. All the while, this subversion remains impossible to detect. Algorithm 1 and Adversary 3 represent a similar security threat for learning rules with bounded loss function. While the attack presented here is not impossible to detect, on points drawn from , the black-box behavior of the subverted model is similar to that of an unsubverted model.

The main result is given in Theorem 4, which shows that any ARO-stable learning rule , with a bounded loss function operating on a finite domain, can be modified into a vulnerable learning rule , where is a parameter. Moreover, subject to our assumption from before that is very small, the stability rate of the vulnerable model is not far from that of , and for each there exists a membership adversary whose advantage is negligibly far (in ) from the maximum advantage possible on . Simply put, it is often possible to find a suitably leaky version of an ARO-stable learning rule whose generalization behavior is close to that of the original.

Theorem 4.

Let , , be a loss function bounded by some constant , be an ARO-stable learning rule with rate , and suppose that uniquely determines the point in . Then for any integer , there exists an ARO-stable learning rule with rate at most and adversary such that:

The proof of Theorem 4 involves constructing a learning rule that leaks precise membership information when queried in a particular way but is otherwise identical to . assumes that the adversary has knowledge of a secret key that is used to select pseudorandom functions that define the “special” queries used to extract membership information. In this way, the normal behavior of the model remains largely unchanged, making approximately as stable as , but the learning algorithm and adversary “collude” to leak information through the model. We require the features to fully determine to avoid collisions when the adversary queries the model, which would result in false positives. In practice, many learning problems satisfy this criterion. Algorithm 1 and Adversary 3 illustrate the key ideas in this construction informally.

Algorithm 1 (Colluding training algorithm ).

Let and be keyed pseudorandom functions, be uniformly chosen keys, and be a training algorithm. On receiving a training set , proceeds as follows:

  1. Supplement using : for all and , let , and set .

  2. Return .

Adversary 3 (Colluding adversary ).

Let , and be the functions and keys used by , and be the product of training with with those keys. On input , the adversary proceeds as follows:

  1. For , let .

  2. Output 0 if for all . Else, output 1.

Algorithm 1 will not work well in practice for many classes of models, as they may not have the capacity to store the membership information needed by the adversary while maintaining the ability to generalize. Interestingly, in Section 6.4 we empirically demonstrate that deep convolutional neural networks (CNNs) do in fact have this capacity and generalize perfectly well when trained in the manner of . As pointed out by Zhang et al. [ZhangBHRV16], because the number of parameters in deep CNNs often significantly exceeds the training set size, despite their remarkably good generalization error, deep CNNs may have the capacity to effectively “memorize” the dataset. Our results supplement their observations and suggest that this phenomenon may have severe implications for privacy.

Before we give the formal proof, we note a key difference between Algorithm 1 and the construction used in the proof. Whereas the model returned by Algorithm 1 belongs to the same class as those produced by , in the formal proof the training algorithm can return an arbitrary model as long as its black-box behavior is suitable.


The proof constructs a learning algorithm and adversary who share a set of keys to a pseudorandom function. The secrecy of the shared key is unnecessary, as the proof only relies on the uniformity of the keys and the pseudorandom functions’ outputs. The primary concern is with using the pseudorandom function in a way that preserves the stability of as much as possible.

Without loss of generality, assume that and . Let and be keyed pseudorandom functions, and let be uniformly sampled keys. On receiving , the training algorithm returns the following model:

We now define a membership adversary who is hard-wired with keys :

Recalling our assumption that the value of uniquely determines the point , we can derive the advantage of on the corresponding trainer in possession of the same keys:

The term comes from the possibility that for all by pure chance.

Now observe that is ARO-stable with rate . If , we use to denote the probability that collides with for some and some key . Note that by a simple union bound, we have for . Then algebraic manipulation gives us the following, where we write in place of to simplify notation:

Note that the term on the last line accounts for the possibility that the sampled at index in is already in , which results in a collision. By the result in [ShalevShwartz10] that states that the average generalization error equals the ARO-stability rate, is ARO-stable with rate , completing the proof. ∎

The formal study of ASAs was introduced by Bellare et al. [BPR14], who considered attacks against symmetric encryption. Subsequently, attacks against other cryptographic primitives were studied as well [GOR15, AMV15, BJK15]. The recent work of Song et al. [SRS17] considers a similar setting, wherein a malicious machine learning provider supplies a closed-source training algorithm to users with private data. When the provider gets access to the resulting model, it can exploit the trapdoors introduced in the model to get information about the private training dataset. However, to the best of our knowledge, a formal treatment of ASAs against machine learning algorithms has not been given yet. We leave this line of research as future work, with Theorem 4 as a starting point.

4 Attribute Inference Attacks

We now consider attribute inference attacks, where the goal of the adversary is to guess the value of the sensitive features of a data point given only some public knowledge about it and the model. To make this explicit in our notation, in this section we assume that data points are triples , where and is the sensitive features targeted in the attack. A fixed function with domain describes the information about data points known by the adversary. Let be the support of when . The function is the projection of into (e.g., ).

Attribute inference is formalized in Experiment 2, which proceeds much like Experiment 1. An important difference is that the adversary is only given partial information about the challenge point .

Experiment 2 (Attribute experiment ).

Let be an adversary, be a positive integer, and be a distribution over data points . The attribute experiment proceeds as follows:

  1. Sample .

  2. Choose uniformly at random.

  3. Draw if , or if .

  4. is 1 if and 0 otherwise.

In the corresponding advantage measure shown in Definition 5, our goal is to measure the amount of information about the target that leaks specifically concerning the training data . Definition 5 accomplishes this by comparing the performance of the adversary when in Experiment 2 with that when .

Definition 5 (Attribute advantage).

The attribute advantage of is defined as:

where the probabilities are taken over the coin flips of , the random choice of , and the random data point or .

Notice that


where and are shortcuts for and , respectively.

This definition has the side effect of incentivizing the adversary to “game the system” by performing poorly when it thinks that . To remove this incentive, one may consider using a simulator , which does not receive the model as an input, when . This definition is formalized below:

Definition 6 (Alternative attribute advantage).


be the Bayes optimal simulator. The attribute advantage of can alternatively be defined as

One potential issue with this alternative definition is that higher model accuracy will lead to higher attribute advantage regardless of how accurate the model is for the general population. Broadly, there are two ways for a model to perform better on the training data: it can overfit to the training data, or it can learn a general trend in the distribution . In this paper, we concern ourselves with the view that the adversary’s ability to infer the target in the latter case is due not to the model but pre-existing patterns in . To allow capturing the difference between overfitting and learning a general trend, we use Definition 5 in the following analysis and leave a more complete exploration of Definition 6 as future work. While adversaries that “game the system” may seem problematic, the effectiveness of such adversaries is indicative of privacy loss because their existence implies the ability to infer membership, as demonstrated by Reduction Adversary 5 in Section 5.1.

4.1 Inversion, generalization, and influence

The case where simply removes the sensitive attribute from the data point such that is known in the literature as model inversion [fredrikson2014privacy, mi2015, mitheory2016, dptheory2016].

In this section, we look at the model inversion attack of Fredrikson et al. [fredrikson2014privacy] under the advantage given in Definition 5. We point out that this is a novel analysis, as this advantage is defined to reflect the extent to which an attribute inference attack reveals information about individuals in . While prior work [fredrikson2014privacy, mi2015] has empirically evaluated attribute accuracy over corresponding training and test sets, our goal is to analyze the factors that lead to increased privacy risk specifically for members of the training data. To that end, we illustrate the relationship between advantage and generalization error as we did in the case of membership inference (Section 3.2). We also explore the role of feature influence, which in this case corresponds to the degree to which changes to a sensitive feature of affects the value . In Section 6.3, we show that the formal relationships described here often extend to attacks on real data where formal assumptions may fail to hold.

The attack described by Fredrikson et al. [fredrikson2014privacy] is intended for linear regression models and is thus subject to the Gaussian error assumption discussed in Section 3.2

. In general, when the adversary can approximate the error distribution reasonably well, e.g., by assuming a Gaussian distribution whose standard deviation equals the published standard error value, it can gain advantage by trying all possible values of the sensitive attribute. We denote the adversary’s approximation of the error distribution by

, and we assume that the target is drawn from a finite set of possible values with known frequencies in . We indicate the other features, which are known by the adversary, with the letter (i.e., , , and ). The attack is shown in Adversary 4. For each , the adversary counterfactually assumes that and computes what the error of the model would be. It then uses this information to update the a priori marginal distribution of and picks the value with the greatest likelihood.

Adversary 4 (General).

Let be the adversary’s guess for the probability density of the error . On input , , , , and , the adversary proceeds as follows:

  1. Query the model to get for all .

  2. Let .

  3. Return the result of .

When analyzing Adversary 4, we are clearly interested in the effect that generalization error will have on advantage. Given the results of Section 3.2, we can reasonably expect that large generalization error will lead to greater advantage. However, as pointed out by Wu et al. [mitheory2016], the functional relationship between and may play a role as well. Working in the context of models as Boolean functions, Wu et al. formalized the relevant property as functional influence [OD14], which is the probability that changing will cause to change when is sampled uniformly.

The attack considered here applies to linear regression models, and Boolean influence is not suitable for use in this setting. However, an analogous notion of influence that characterizes the magnitude of change to is relevant to attribute inference. For linear models, this corresponds to the absolute value of the normalized coefficient of . Throughout the rest of the paper, we refer to this quantity as the influence of without risk of confusion with the Boolean influence used in other contexts.

Binary Variable with Uniform Prior

The first part of our analysis deals with the simplest case where with . Without loss of generality we assume that for some fixed , so in this setting is a straightforward proxy for influence. Theorem 5 relates the advantage of Adversary 4 to , , and .

Theorem 5.

Let be drawn uniformly from and suppose that , where if and if . Then the advantage of Adversary 4 is .


Given the assumptions made in this setting, we can describe the behavior of as returning the value that minimizes . If , it is easy to check that guesses correctly if and only if . This means that ’s advantage given is


Similar reasoning shows that ’s advantage given is exactly the same, so the theorem follows from Equation 5. ∎

Clearly, the advantage will be zero when there is no generalization error (). Consider the other extreme case where and . When is very small, the adversary will always guess correctly because the influence of overwhelms the effect of the error . On the other hand, when is very large, changes to will be nearly imperceptible for “normal” values of , and the adversary is reduced to random guessing. Therefore, the maximum possible advantage with uniform prior is . As a model overfits more, decreases and tends to increase. If remains fixed, it is easy to see that the advantage increases monotonically under these circumstances.

Figure 1 shows the effect of changing as the ratio remains fixed at several different constants. When , does not have any effect on the output of the model, so the adversary does not gain anything from having access to the model and is reduced to random guessing. When is large, the adversary almost always guesses correctly regardless of the value of since the influence of drowns out the error noise. Thus, at both extremes the advantage approaches 0, and the adversary is able to gain advantage only when and are in balance.

Figure 1: The advantage of Adversary 4 as a function of ’s influence . Here

is a uniformly distributed binary variable.

General Case

Sometimes the uniform prior for may not be realistic. For example, may represent whether a patient has a rare disease. In this case, we weight the values of by the a priori probability before comparing which is the most likely. With uniform prior, we could simplify to regardless of the value of used for . On the other hand, the value of matters when we multiply by . Because the adversary is not given , it makes an assumption similar to that described in Section 3.2 and uses .

Clearly results in zero advantage. The maximum possible advantage is attained when and . Then, by similar reasoning as before, the adversary will always guess correctly when and is reduced to random guessing when , resulting in an advantage of