When differential privacy meets NLP: The devil is in the detail

Differential privacy provides a formal approach to privacy of individuals. Applications of differential privacy in various scenarios, such as protecting users' original utterances, must satisfy certain mathematical properties. Our contribution is a formal analysis of ADePT, a differentially private auto-encoder for text rewriting (Krishna et al, 2021). ADePT achieves promising results on downstream tasks while providing tight privacy guarantees. Our proof reveals that ADePT is not differentially private, thus rendering the experimental results unsubstantiated. We also quantify the impact of the error in its private mechanism, showing that the true sensitivity is higher by at least factor 6 in an optimistic case of a very small encoder's dimension and that the amount of utterances that are not privatized could easily reach 100 of the entire dataset. Our intention is neither to criticize the authors, nor the peer-reviewing process, but rather point out that if differential privacy applications in NLP rely on formal guarantees, these should be outlined in full and put under detailed scrutiny.



There are no comments yet.



Low Influence, Utility, and Independence in Differential Privacy: A Curious Case of 3 2

We study the relationship between randomized low influence functions and...

ADePT: Auto-encoder based Differentially Private Text Transformation

Privacy is an important concern when building statistical models on data...

Pain-Free Random Differential Privacy with Sensitivity Sampling

Popular approaches to differential privacy, such as the Laplace and expo...

Differentially Private Distributed Data Summarization under Covariate Shift

We envision AI marketplaces to be platforms where consumers, with very l...

NeuraCrypt is not private

NeuraCrypt (Yara et al. arXiv 2021) is an algorithm that converts a sens...

Guidelines for Implementing and Auditing Differentially Private Systems

Differential privacy is an information theoretic constraint on algorithm...

Identification and Formal Privacy Guarantees

Empirical economic research crucially relies on highly sensitive individ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The need for NLP systems to protect individuals’ privacy has led to the adoption of differential privacy (DP). DP methods formally guarantee that the output of the algorithm will be ‘roughly the same’ regardless of whether or not any single individual is present in the central dataset; this is achieved by employing randomized algorithms (Dwork.Roth.2013). Local DP, a variant of DP, mitigates the need for a central dataset and applies randomization on each individual’s datapoint. Local DP thus guarantees that its output for an individual A will be ‘almost indistinguishable’ from the output of any other individuals B or C.111See the randomized response for an easy explanation of local DP for a single bit (Warner.1965).

This level of privacy protection makes local DP an ideal framework for NLP applications that operate on sensitive user input which should not be collected and processed globally by an untrusted party, e.g., users’ verbatim utterances. When the utterances are ‘privatized’ by local DP, any future post-processing or adversarial attack cannot reveal more than allowed by the particular local DP algorithm’s properties (namely the parameter; see later Sec. 2).

ADePT, a local DP algorithm recently published at EACL by Krishna.et.al.2021.EACL from Amazon Alexa, proposed a differentially private auto-encoder for text rewriting. In summary, ADePT takes an input textual utterance and re-writes it in a way such that the output satisfies local DP guarantees. Unfortunately, a thorough formal analysis reveals that ADePT is in fact not differentially private and the privatized data do not protect privacy of individuals as formally promised.

In this short paper, we shed light on ADePT’s main argument, the privacy mechanism. We briefly introduce key concepts from differential privacy (DP) and present a detailed proof of the Laplace mechanism (Sec. 2). Section 3 introduces ADePT’s (Krishna.et.al.2021.EACL) architecture and its main privacy argument. We formally prove that the proposed ADePT’s mechanism is in fact not differentially private (Sec. 4) and determine the actual sensitivity of its private mechanism (Sec. 5). We sketch to which extent ADePT breaches privacy as opposed to the formal DP guarantees (Sec. 6) and discuss a potential adversary attack (Appendix C).

2 Theoretical background

From a high-level perspective, DP works with the notion of individuals whose information is contained in a database (dataset). Each individual’s datapoint (or record

), which could be a single bit, a number, a vector, a structured record, a text document, or any arbitrary object, is considered private and cannot be revealed. Moreover, even whether or not any particular individual A is in the database is considered private.

Definition 2.1.

Let be a ‘universe’ of all records and be two datasets from this universe. We say that and are neighboring datasets if they differ in one record.

For example, let dataset consist of documents where each document is associated with an individual whose privacy we want to preserve. Let differ from by one document, so either , or with -th document replaced. Then by definition 2.1, and are neighboring datasets.

Global DP and queries

In a typical setup, the database is not public but held by a trusted curator. Only the curator can fully access all datapoints and answer any query we might have, for example how many individuals are in the database, whether or not B is in there, what is the most common disease (if the database is medical), what is the average length of the documents (if the database contains texts), and so on. The types of queries are task-specific, and we can see them simply as functions with arbitrary domain and co-domain . In this paper, we focus on a simple query type, the numerical query, that is a function with co-domain in .

For example, consider a dataset containing textual documents and a numerical query that returns an average document length. Let’s assume that the length of each document is private, sensitive information. Let the dataset contain a particular individual A whose privacy we want to breach. Say we also have some leaked background information, in particular a neighboring dataset that contains all datapoints from except for A. Now, if the trusted curator returned the true value of , we could easily compute A’s document length, as we know , and thus we could breach A’s privacy. To protect A’s privacy, we will employ randomization.

Definition 2.2.

Randomized algorithm takes an input value and outputs a value

nondeterministically, e.g., by drawing from a certain probability distribution.

Typically, randomized algorithms are parameterized by a density (for ) or a discrete distribution (for categorical or binary ). The randomized algorithm ‘perturbs’ the input by drawing from that distribution. We suggest to consult (Igamberdiev.Habernal.2021) for yet another NLP introduction to differential privacy.

Definition 2.3.

Randomized algorithm satisfies (,0)-differential privacy if and only if for any neighboring datasets from the domain of , and for any possible output from the range of , it holds



denotes probability

222The definition holds both for densities and probability mass functions as . and is the privacy budget. A smaller means stronger privacy protection, and vice versa (Wang.et.al.2020.Sensors; Dwork.Roth.2013).

In words, to protect each individual’s privacy, DP adds randomness when answering queries such that the query results are ‘similar’ for any pair of neighboring datasets. For our example of the average document length, the true average length would be randomly ‘noisified’.

Another view on -DP is when we treat and as two probability distributions. Then -DP puts upper bound on Max Divergence

, that is the maximum ‘difference’ of any output of two random variables.


Differential privacy has also a Bayesian interpretation, which compares the adversary’s prior with the posterior after observing the values. The odds ratio is bounded by

, see (Mironov.2017.CSF, p. 266).

Neighboring datasets and local DP

The original definition of neigboring datasets (Def. 2.1) is usually adapted to a particular scenario; see (Desfontaines.Pejo.2020) for a thorough overview. So far, we have shown the global DP scenario with a trusted curator holding a database of individuals. The size of the database can be arbitrary, even containing a single individual, that is . In this case, we say a dataset is neighboring if it contains another single individual (, ). This setup allows us to proceed without the trusted curator, as each individual queries its single record and returns differentially private output; this scenario is known as local DP.

In local differential privacy, where there is no central database of records, any pair of data points (examples, input values, etc.) is considered neighboring (Wang.et.al.2020.Sensors). This also holds for ADePT: using the DP terminology, any two utterances , are neighboring datasets (Krishna.et.al.2021.EACL).

Definition 2.4.

Let be neighboring datasets. The -sensitivity of a function is defined as


where is a -norm defined as (Dwork.Roth.2013, p. 31).

Definition 2.5.

Laplace density with scale centered at is defined as

Definition 2.6.

Laplace randomized algorithm (Dwork.Roth.2013, p. 32). Given any function , the Laplace mechanism is defined as


where are i.i.d. random variables drawn from a Laplace distribution


An analogous definition centers the Laplace noise directly at the function’s output, that is


From Definition 2.6 also immediately follows that at point , the density value of the Laplace mechanism is

Theorem 2.1.

The Laplace randomized algorithm preserves -DP (Dwork.Roth.2013).

As ADePT relies on the proof of the Laplace mechanism, we show the full proof in Appendix A.

3 ADePT by Krishna.et.al.2021.EACL

Let be an input text (a sequence of words or a vector, for example; this is not key to the main argument). is an encoder function from input to a latent representation vector where is the number of dimensions of that latent space. is a decoder from the latent representation back to the original input space (again, a sequence of words or a vector). What we have so far is a standard auto-encoder, such that


Krishna.et.al.2021.EACL define ADePT as a randomized algorithm that, given an input , generates as , where is a clipped latent representation vector with added noise


where , is an arbitrary clipping constant, and is an (Euclidean) norm defined as .

Theorem 3.1 (which is false).

(Krishna.et.al.2021.EACL) If is a multidimensional noise, such that each element is independently drawn from a distribution shown in equation 10, then the transformation from is -DP.


Krishna.et.al.2021.EACL refers to the proof of Theorem 3.6 by Dwork.Roth.2013, which is the proof of the Laplace mechanism. ∎

First, in Eq. 10 is ambiguous as it ‘semantically’ relates to which is the decoded vector that comes first after drawing a random value; moreover and have different dimensions. Given that the authors employ Laplacian noise and base their proofs on Theorem 3.6 from Dwork.Roth.2013, we believe that Eq. 10 is the standard Laplace mechanism


such that each value is drawn independently from a zero-centered Laplacian noise parametrized by scale (Definition 2.6). Given the density from Eq. 3, we rewrite Eq. 11 as


Krishna.et.al.2021.EACL set their clipped encoder output as the function , that is444We contacted the authors several times to double check that this formula is correct without a potential typo but got no response. However other parts of the paper give evidence it is correct, e.g., the authors use an analogy to a hyper-sphere which is considered euclidean by default.

Theorem 3.2 (which is false).

(Krishna.et.al.2021.EACL) Let be a function as defined in equation 13. The -sensitivity of this function is .


(Krishna.et.al.2021.EACL) Maximum norm difference between two points in a hyper-sphere of radius is . ∎

Thus by plugging the sensitivity from Theorem 3.2 into Eq. 12, we obtain


which is what Krishna.et.al.2021.EACL express in Eq. 10. To sum up, the essential claim of Krishna.et.al.2021.EACL is that if each is drawn from Laplacian distribution with scale , their mechanism is differentially private.

4 ADePT with Laplace mechanism is not differentially private


Following the proof of Theorem 2.1, the following bound (Eq. 33) must hold for any

and thus this inequality must hold too


Fix the clipping constant arbitrarily (), set dimensions to . Let be the input of the clipping function from Eq. 13.

(from Eq. 13)

Similarly, let be input , for which we get analogically . Then


Plug Theorem 3.2 and Eq. 21 into Eq. 15


therefore Theorem 3.1 by Krishna.et.al.2021.EACL must be false. ∎

In general, it is the inequality that makes ADePT fail the DP proof.

5 Actual sensitivity of ADePT

Theorem 5.1.

Let be a function as defined in Eq. 13. The sensitivity of this function is .


See Appendix B. ∎

Corollary 5.1.

Since only for , ADePT could be differentially private only if the encoder’s latent representation were a single scalar.

Since Krishna.et.al.2021.EACL do not specify the dimensionality of their encoder’s output, we can only assume some typical values in a range from 32 to 1024, so that the true sensitivity of ADePT is to times higher than reported.

6 Magnitude of non-protected data

How many data points actually violate the privacy guarantees? Without having access to the trained model and its hyper-parameters (, in particular), it is hard to reason about properties of the latent space, where privatization occurs. We thus simulated the encoder’s ‘unclipped’ vector outputs by sampling 10k vectors from two distributions: 1) uniform within for each dimension, and 2) zero-centered normal with . Especially the latter one is rather optimistic as it samples most vectors close to zero. In reality these latent space vectors are unbounded.

Each pair of such vectors in the latent space after clipping but before applying DP (Eq. 13) is ‘neighboring datasets’ so their distance must be bound by sensitivity ( as claimed in Theorem 3.2) in order to satisfy DP with the Laplace mechanism.

We ran the simulation for an increasing dimensionality of the encoder’s output and measured how many pairs violate the sensitivity bound.555Code available at
Fig. 1

shows the ‘curse of dimensionality’ for norms. Even for a considerably small encoder’s vector size of 32 and unbounded encoder’s latent space, almost

none of the data points would be protected by ADePT’s Laplace mechanism.


Figure 1: Simulation results. Percentage of ‘neighboring datasets’ that violate the distance bounds required by the Laplace mechanism with sensitivity .

7 Discussion

Local DP differs from centralized DP in such a way that there is no central database and once the privatized data item ‘leaves’ an individual, it stays so forever. This makes typical membership inference attacks unsuitable, as no matter what happens to the rest of the world, the probability of inferring the individual’s true value after observing their privatized data item is bounded by .

For example, the ATIS dataset used in ADePT contains 5,473 utterances of lengths 1 to 46 tokens, with a quite limited vocabulary of 941 words. In theory, the search space of all possible utterances would be of size , and under -DP all of them are multiplicatively indistinguishable – for example, after observing “on april first i need a ticket from tacoma to san jose departing before 7 am”

from ADePT’s autoencoder privatized output, the true input might well have been

“on april first i need a flight going from phoenix to san diego” or “monday morning i would like to fly from columbus to indianapolis” and our posterior certainty of any of those is limited by the privacy bound. However, since outputs of ADePT are leaking privacy, attacks are possible. We sketch a potential scenario in Appendix C.

There are two possible remedies for ADePT. Either the latent vector clipping in Eq. 9 could use -norm, or the Laplacian noise in Eq. 10 could use the correct sensitivity as determined in Theorem 5.1. In either case, the utility in the downstream tasks as presented by Krishna.et.al.2021.EACL are expected to be worse due to a much larger amount of required noise.

8 Conclusion

This paper revealed a potential trap for NLP researchers when adopting a local DP approach. We believe it contributes to a better understanding of the exact modeling choices involved in determining the sensitivity of local DP algorithms. We hope that DP will become a widely accessible and well-understood framework within the NLP community.


The independent research group TrustHLT is supported by the Hessian Ministry of Higher Education, Research, Science and the Arts. Thanks to Max Glockner, Timour Igamberdiev, Jorge Cordona, Jan-Christoph Klie, and the anonymous reviewers for their helpful feedback.


Appendix A Proof of Laplace mechanism

Theorem A.1.

Negative triangle inequality for absolute values. For ,


Proof is directly based on the triangle inequality.

Corollary A.1.

Definition 2.4 implies that is an upper bound value on the norm of the function output for any neighboring and . In other words

The actual proof (Dwork.Roth.2013).

We will prove that for any the following ratio


is bounded by and thus satisfies Definition 2.3. Fix arbitrarily. By plugging Eq. 7 into Eq. 27, we get


which is what we wanted. By symmetry we get the proof for .

Appendix B Proof of Theorem 5.1


The definition of sensitivity corresponds to the maximum distance of any two vectors from the range of . As Eq. 13 bounds all vectors to their (Euclidean) norm, we want to find the distance between two opposing points on an -dimensional sphere that have maximal distance.

Let be the number of dimension and a positive constant. We solve the following optimization problem

x_1,…,x_nf(x_1, …, x_n) = x_1 + …+ x_n x_1^2 + …+ x_n^2=C

First, we can get rid of the absolute values in as the maximums will be symmetric, i.e. .

Using Lagrange multipliers, we define the constraints as


The gradient is

Solve by the following system of equations

From the first expressions we get

hence . Plugging into the last term we obtain


Geometrically, corresponds to the size of an edge of a hypercube embedded into a hypersphere of radius .

Now let such that they have maximum norm (Eq. 36) and their norm is (that is the output of function after clipping in Eq. 13)

Then their distance is


Appendix C Potential attacks

Here we only sketch a potential attack on a single individual’s privatized output . We do not speculate on the actual feasibility as differentiall privacy operates with the worst case scenario, that is the theoretical possibility that the adversary has unlimited compute power and unlimited background knowledge. However, real life examples show that anything less protective than DP can be attacked and it is mostly a matter of resources.666Diffix, a EU-based company, claimed their system is a better alternative to DP but did not provide formal guarantees for such claims. A paper from Gadotti.et.al.2019.USENIX was a bitter lesson for Diffix, as it shows a successful attack. The bottom line is that without formal guarantees, it is impossible to prevent any future attacks.

We expect to have access to the trained ADePT autoencoder as well as the ATIS corpus (without the single individual whose value we try to infer, to be fair). We would need to find the privatized latent vector of , that is , which could be possible by exploiting and probing the model. Second, by employing a brute-force attack, we can train a LM on ATIS to generate a feasible search space of input utterances, project them to the latent space, and explore the neighborhood of . This would drastically reduce the search space. Then, depending on the geometric properties of that latent space, it might be the case that ‘similar’ utterances are closer to each other, increasing the probability of finding a similar utterance which might be a ‘just good enough’ approximation for the adversary.