Differentially Private Representation for NLP: Formal Guarantee and An Empirical Study on Privacy and Fairness

10/03/2020 ∙ by Lingjuan Lyu, et al. ∙ National University of Singapore Monash University The University of Melbourne 3

It has been demonstrated that hidden representation learned by a deep model can encode private information of the input, hence can be exploited to recover such information with reasonable accuracy. To address this issue, we propose a novel approach called Differentially Private Neural Representation (DPNR) to preserve the privacy of the extracted representation from text. DPNR utilises Differential Privacy (DP) to provide a formal privacy guarantee. Further, we show that masking words via dropout can further enhance privacy. To maintain utility of the learned representation, we integrate DP-noisy representation into a robust training process to derive a robust target model, which also helps for model fairness over various demographic variables. Experimental results on benchmark datasets under various parameter settings demonstrate that DPNR largely reduces privacy leakage without significantly sacrificing the main task performance.



There are no comments yet.


page 1

page 2

page 3

page 4

Code Repositories

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Many language applications have involved deep learning techniques to learn text representation through neural models 

bengio2003neural; mikolov2013distributed; devlin2019bert, performing composition over the learned representation for downstream tasks collobert2011natural; socher-etal-2013-recursive

. However, the input text often provides sufficient clues to portray the author, such as gender, age, and other important attributes. For example, sentiment analysis tasks often have privacy implications for authors whose text is used to train models. Many user attributes have been shown to be easily detectable from online review data, as used extensively in sentiment analysis results 

hovy2015user; potthast2017overview. Private information can take the form of key phrases explicitly contained in the text. However, it can also be implicit. For example, demographic information about the author of a text can be predicted with above chance accuracy from linguistic cues in the text itself preoctiuc2015analysis.

On the other hand, even the learned representation, rather than the text itself, may still contain sensitive information and incur significant privacy leakage. One might argue that sensitive information like gender, age, location and password should not be leaked out and should have been removed from representation. However, on the intermediate representation level, which is trained from the input text to contain useful features for the prediction task, it can meanwhile encode personal information which might be exploited for adversarial usages, especially a modern deep learning model has vastly more capacity than they need to perform well on their tasks. And, it has been justified that an attacker can recover private variables with higher-than-chance accuracy, only using hidden representation li2018towards; coavoux2018privacy

. Therefore, the fact that representations appear to be abstract real-numbered vectors should not be misconstrued as being safe.

The naive solution of removing protected attributes is insufficient: other features may be highly correlated with, and thus predictive of, the protected attributes pedreshi2008discrimination. To tackle with these privacy issues, li2018towards proposed to train deep models with adversarial learning, which explicitly obscures individuals’ private information, while improves the robustness and privacy of neural representation in part-of-speech tagging and sentiment analysis tasks. In a parallel study, coavoux2018privacy proposed defence methods based on modifications of the training objective of the main model. However, both works provide only empirical improvements in privacy, without any formal guarantees. Prior works have approached formal differential privacy guarantee by training differentially private deep models abadi2016deep; mcmahan2018learning; yu2019differentially. However, these works generally only considered the training data privacy rather than the test data privacy. While cryptographic methods can be used for privacy protection, it could be resource-hungry or overly complex for the user.

To alleviate the above limitations, we take inspirations from differential privacy dwork2014algorithmic to provide formal privacy guarantee of the extracted representation from user-authored text. Meanwhile, we propose a robust training algorithm to derive a robust target model to maintain utility, which also offers fairness as a by-product. To the best of our knowledge, our work is the only work to date that can provide formal differential privacy guarantee of the extracted representation, while ensuring fairness.

Our main contributions include:

  • For the first time, the privacy of the extracted neural representation from text is formally quantified in the context of differential privacy. A novel approach called Differentially Private Neural Representation (DPNR) is proposed to perturb the extracted representation.Also, we prove that masking words via dropout can further enhance privacy.

  • To maintain utility, we propose a robust training algorithm that incorporates the noisy training representation in the training process to derive a robust target model, which also reduces model discrimination in most cases.

  • On benchmark datasets across various domains and multiple tasks, we empirically demonstrate that our approach yields comparable accuracy to the non-private baseline on the main task, while significantly outperforms the non-private baseline and adversarial learning on the privacy task111code and preprocessed datasets are available at: https://github.com/xlhex/dpnlp.git.

2 Preliminary: Differential Privacy

Differential privacy dwork2014algorithmic provides a mathematically rigorous definition of privacy and has become a de facto standard for privacy analysis. Within DP framework, there are two general settings: central DP (CDP) and local DP (LDP).

In CDP, a trusted data curator answers queries or releases differentially private models by using randomisation mechanisms dwork2014algorithmic; abadi2016deep; yu2019differentially. For scenarios where data are sourced from end users, and end users do not trust any third parties, DP should be enforced in a “local” manner to enable end users to perturb their data before publication, which is termed as LDP dwork2014algorithmic; duchi2013local. Compared with CDP, LDP offers a stronger level of protection.

In our system, we aim to protect the test-phase privacy of the extracted neural representations from end users, we therefore adopt LDP. LDP has shown the advantage that the data is randomised before individuals disclose their personal information, so the server and the middle eavesdropper can never see or receive the raw data. In terms of LDP mechanisms, randomised response warner1965randomized; duchi2013local

and its variants have been widely used for aggregating statistics, such as frequency estimation, heavy hitter estimation, etc 


Definition 2.1.

Let be a randomised algorithm mapping a data entry in to . The algorithm is -local differentially private if for all data entries and all outputs , we have

If , is said to be -local differentially private.

A formal definition of LDP is provided in Definition 2.1, The privacy parameter captures the privacy loss consumed by the output of the algorithm: ensures perfect privacy in which the output is independent of its input, while gives no privacy guarantee.

For every pair of adjacent inputs and , differential privacy requires that the distribution of and are “close” to each other where closeness are measured by the privacy parameters and . Typically, the inputs and are adjacent inputs when all the attributes of one record are modified. In real scenario, the adjacent input is an application specific notion. For example, a sentence is divided into several items for every 5 words, and two sentences are considered to be adjacent if they differ by at most 5 consecutive words wang2018not. In this work, we consider a word-level DP, i.e., two inputs are considered to be adjacent if they differ by at most 1 word. For brevity, we use -DP to represent -LDP for the rest of the paper. We remark that all the randomisation mechanisms used for CDP, including Laplace mechanism and Gaussian mechanism dwork2014algorithmic, can be individually used by each party to inject noise into local data to ensure LDP before releasing lyu2020towards; yang2020local; lyu2020democratise; sun2020federated. In particular, we adopt Laplace Mechanism which ensures -DP with throughout the paper.

In a nutshell, data universe can be expressed as , which will be convenient to partition as  jagielski2018differentially. Given one person’s record , we can write it as a pair where represents the insensitive attributes and represents the sensitive attributes. Our main goal is to promise differential privacy only with respect to the sensitive attributes. Write to denote that and differ in exactly one coordinate (i.e. one word/token in NLP domain). An algorithm is -differentially private in the sensitive attributes if for all and for all and for all , we have:

Post-processing. DP enjoys a well-known post-processing property  dwork2014algorithmic: any computation applied to the output of an -DP algorithm remains -DP. This nice property allows the attacker to implement any sophisticated post-processing function on the privatised representation from the user, without compromising DP or making it less differentially private.

3 Main Framework

3.1 Attack Scenario

As indicated in sec:introduction, uploading raw input or representations to a server takes the risk of revealing sensitive information to the eavesdropper who eavesdrops on the hidden representation and tries to recover private information of the input text. Hence, similar as coavoux2018privacy, we consider an attack scenario during inference phase in Figure 1, which consists of three parts: (i) a feature extractor to extract latent representation of any test input ; (ii) a main classifier to predict the label from the extracted latent representation; (iii) and an attacker (eavesdropper) who aims to infer some private information contained in , from the latent representation of

used by the main classifier. In this scenario, each example consists of a triple

, where is an input text, is a single label (e.g. topic or sentiment), and is a vector of private information contained in

. Such attack would occur in scenarios where the computation of a neural network is shared across multiple devices. For example, phone users send their learned representations to the cloud for grammar correction or translation 

li2018towards, or to obtain the classification result, e.g., the topic of the text or its sentiment li2017privynet.

Figure 1: Attack scenario during inference phase.

3.2 Methodology

To defend against the middle eavesdropper, we aim to design an approach that can preserve privacy of the extracted test representation from the user without significantly degrading the main task performance. To achieve this goal, we introduce a DP noise layer after a predefined feature extractor (determined by the server), which results in differentially private representation that can be transferred to the server for classification (the topic of the text or its sentiment), as shown in Figure 2.

Figure 2: Illustration of the proposed main framework.

In terms of model training on the server, theoretically, one could remove the noise layer and conduct non-private training by following Equation 1:


where is the feature extractor, is the classifier, is the true label, and

denotes the cross entropy loss function.

However, doing so may deteriorate test performance, due to the injected noise in the test representation. To improve model robustness to the noisy representation, we put forward a robust training algorithm by incorporating a noise layer which adds the same level of noise as the test phase in the training process as well. Therefore, the robust training objective can be re-written as:


The detailed robust training process on the server is given in Algorithm 1. After the robust target model is built, server then provides a feature extractor to the user, as illustrated in Figure 2.

3.3 Privacy Guarantee

Let be the extracted representation from by feature extractor , and to apply -DP to the extracted neural representation, we inject Laplace noise to as follows:

where the coordinates

are i.i.d. random variables drawn from the Laplace distribution defined by

, where the noise scale , is the privacy budget and is the sensitivity of the extracted representation.

3.3.1 Formal Privacy Guarantee

Algorithm 2 outlines how to derive differentially private neural representation from the feature extractor . Each user first feeds its masked sensitive record into a feature extractor to extract representation .

Note that to apply additive noise mechanism, the sensitivity of the output representation needs to be determined. Estimating the true sensitivity of is challenging. Instead, we follow shokri2015privacy to use input-independent bounds by enforcing a [0,1] range on the extracted representation, hence bounding the sensitivity of each element of the extracted representation with 1, i.e., . Limiting the range of the extracted representation can also improve the training process by helping to avoid overfitting.

Input: Training record ; Feature extractor ; Classifier .
1: Extraction: ;
2: Normalization: ;
3: Perturbation: ;
4: Calculate loss

and do backpropagation to update

and .
Algorithm 1 Robust Training on the Server

A formal statement for the privacy guarantees of Algorithm 2 is provided in Theorem 1.

Theorem 1.

Let the entries of the noise vector be drawn from with . Then Algorithm 2 is -differentially private.

Input: Each sensitive record ; Feature extractor .
Parameters: Dropout vector ;
1: Word Dropout: ;
2: Extraction: ;
3: Normalisation: ;
4: Perturbation: ;
Output: Perturbed representation .
Algorithm 2 Differentially Private Neural Representation (DPNR)

3.3.2 Word Dropout Enhances Privacy

In NLP, each input is a sequence composed of words/tokens . Under word-level DP, two sentences are considered to be adjacent inputs if they differ by at most 1 word (i.e., 1 edit distance). In this scenario, to lower privacy budget without significantly degrading the inference performance, we borrow the idea of nullification wang2018not and apply it to word dropout.

For each sensitive record , words are masked by a dropout operation before DP perturbation. Given a sensitive input that consists of words, dropout performs word-wise multiplication of with , i.e., , where . In can be either specified by users to mask the highly sensitive words or generated randomly. The number of zeros in is determined by , where is the dropout rate. The zeros are located in

conforming to the uniform distribution.

As stated in Theorem 2, word dropout in combination with any -differentially private mechanism provides a tighter privacy bound in the context of word-level DP. A detailed proof follows.

Theorem 2.

Given an input , suppose is -differentially private, let with dropout rate be applied to , i.e., , then is -differentially private, where .


Suppose there are two adjacent inputs and that differ only in the -th coordinate (word), say , . For arbitrary binary vector , after dropout, , , there are two possible cases, i.e., , and .

Case 1: . Since and differ only in -th coordinate, after dropout, , hence . It then follows

Case 2: . Since and differ only in the value of their -th coordinate, after dropout, , , hence and remain adjacent inputs that differ only in -th coordinate. Because is -differentially private, it then follows

Combine these two cases, and use the fact that , we have:

Therefore, after dropout, the privacy budget is lowered to .∎

Since the perturbed representation is -differentially private, combining dropout beforehand, the privacy budget is lowered to , hence improving privacy guarantee. Apparently, a high value of has a positive impact on the privacy but a potential negative impact on the utility. In particular, when , all words will be masked, which gives the highest privacy, i.e., , but totally destroys inference performance. Hence, a smaller value of is preferred to trade off privacy and accuracy.

4 Experiments

In this section, we conduct comprehensive studies over different tasks and datasets to examine the efficacy of the proposed algorithm from three facets: 1) main task performance, 2) privacy and 3) target model fairness.

4.1 Task and Dataset

We use two natural language processing tasks: 1) sentiment analysis and 2) topic classification, with a range of benchmark datasets across various domains. Table 

1 summarises the statistics of the used datasets.

4.1.1 Sentiment Analysis

Trustpilot Sentiment dataset hovy2015user contains reviews associated with a sentiment score on a five point scale, and each review is associated with 3 attributes: gender, age and location, which are self-reported by users. The original dataset is comprised of reviews from different locations, however in this paper, we only derive tp-us for our study. Following coavoux2018privacy, we extract examples containing information of both gender and age, and treat them as the private information. We categorise “age” into two groups: “under 34” (u34) and “over 45” (o45).

4.1.2 Topic Classification

For topic classification, we focus on two genres of documents: news articles and blog posts.

News article

We use ag news corpus del2005ranking. To ensure a fair comparison, we use the corpus preprocessed by  coavoux2018privacy222https://github.com/mcoavoux/pnet/tree/master/datasets. We use both “title” and “description” fields as the input document.. And the task is to predict the topic label of the document, with four different topics in total.

Regarding the private information in ag, named entities appearing in text are vulnerable to privacy leakage inferred by attackers. In order to simulate the attack, we firstly adopt the NLTK NER system bird2009natural to recognise all “Person” entities in the corpus. Then we retain the five most frequent person entities and use them as the private information. Due to the sparsity of name entities, each target entity only appears in very few articles. Hence we select the examples containing at least one of these named entities to mitigate the unbalance and data scarcity. Thus, the attacker aims to identify these five entities as five independent binary classification tasks.

Blog posts

We derive a blog posts dataset (blog) from the blog authorship corpus presented schler2006effects. However, the original dataset only contains a collection of blog posts associated with authors’ age and gender attributes but does not provide topic annotations. Thus we follow coavoux2018privacy to run the LDA algorithm blei2003latent with the topic number of 10 on the whole collection to identify the topic label of each document. Afterwards, we selected posts with single dominating topic () and discarded the rest, which results in a dataset with 10 different topics. Similar to TP-US, the private variables are comprised of the age and gender of the author. And the age attribute is binned into two categories, “under 20” (U20) and “over 30” (O30).

For all three datasets, we randomly split the preprocessed corpus into training, development and test by 8:1:1.

Dataset Private Variable #Train #Dev #Test
tp-us age, gender 22,142 2,767 2,767
ag entity 11,657 1,457 1,457
blog age, gender 7,098 887 887
Table 1: Summary of three pre-processed datasets.

4.2 Evaluation Metrics

Similar to coavoux2018privacy, we define sentiment analysis and topic classification as the main tasks, whereas the inference of private information is considered as the auxiliary tasks of attackers. Each auxiliary task is eavesdropped by one attacker.

We use accuracy to assess the performance for both main tasks. The auxiliary tasks are evaluated via the following metrics:

  • For demographic variables (i.e., gender and age): , where is the average over the accuracies of the prediction by the attacker on these variables.

  • For named entities: , where is the F1 score between the ground truths and the prediction by the attacker on the presence of all named entities.

We denote the value of 1- or 1- as empirical privacy, i.e., the inverse accuracy or F1 score of the attacker, higher means better empirical privacy, i.e., lower attack performance.

tp-us ag blog
non-priv 85.53 78.75 97.07
dpnr 0.05 85.65 80.87 96.69
0.1 85.52 80.78 96.39
0.5 85.52 79.71 96.84
1 85.36 79.36 96.39
5 85.87 79.59 96.66
Table 2: Main task accuracy [%] of non-priv and dpnr over 3 datasets with varying and fixed .

4.3 Model Selection

Model and Parameters. For implementation, owing to its success across multiple NLP tasks, we apply BERT base  devlin2019bert

to the classification tasks. Specially, BERT takes a text input, then generates a representation which embeds holistic information. We apply a dropout to this representation before a softmax layer, which is responsible for label classification.We run 4 epochs on the training set, and choose the checkpoint with the best loss on the dev set.

After we obtain a well-trained target model, we partition it into two parts, BERT model acts as the feature extractor in Figure 2, which could be deployed on users’ devices, while the remaining layers act as the classifier on the server. In our implementation, privacy is enforced in the hidden representation extracted by the feature extractor as shown by Algorithm 2

. For attack classifier, we utilise a 2-layer MLP with 512 hidden units and ReLU activation trained over the target model, which delivers the best attack performance on the dev set in our preliminary experiments.

We report the averaged results over 5 independent runs for all experiments.

4.4 Performance Analysis of Target Model

Firstly, we would like to study how the privacy parameters () in Theorem 1 and 2 affect the accuracy of main tasks. We investigate this using different parameter settings, varying one parameter while fixing the other.

4.4.1 Impact of Privacy Budget

To analyse the impact of different privacy budget on accuracy, we choose with fixed . Noted that to provide reasonable privacy guarantee, should be set below 10 hamm2015crowd; abadi2016deep. Moreover, means a relatively tight privacy guarantee. Surprisingly, there is no obvious relationship between accuracy and . We speculate the denoising training procedure of BERT and layernorm ba2016layer make BERT resistant to the injected noises, which can maintain the performance of the main tasks. We will conduct an in-depth study on this in the future.

Table 2 shows that in most cases, our method can achieve comparable performance to the non-private baseline, across all even when the noise level is high (), which validates the robustness of our method to DP noise. It also implies that the DP-noised representation not only preserves privacy, but also retains general information for the main task.

4.4.2 Impact of Dropout Rate

Similarly, we study how the word dropout rate affects accuracy-privacy trade-off. Table 3 reports the performance of different models under different with fixed . In most cases, as becomes larger, accuracy starts to degrade as expected. However, as indicated in Theorem 2, higher results in better privacy as well. Moreover, can still provide a relatively high accuracy, while privacy budgets are reduced to .

tp-us ag blog
non-priv 85.53 78.75 97.07
dpnr 0.1 85.53 80.71 96.05
0.3 84.85 79.18 93.76
0.5 83.51 77.42 90.98
0.8 80.70 69.57 82.94
Table 3: Main task accuracy [%] of non-priv and dpnr over 3 datasets with varying and fixed .

Overall, both results demonstrate that our dpnr can protect privacy of the extracted representations of user-authored text, without significantly affecting the main task performance.

4.5 Attack Model

Apart from formal privacy guarantee from DP, we use the performance of the diagnostic classifier of the attackers for empirical privacy. To fairly compare with the standard training and adversarial training in previous work coavoux2018privacy, we train an attack model that is trying to predict private variables from the representation. We measure the empirical privacy of a hidden representation by the ability of an attacker to predict accurately specific private information from it. If its empirical privacy (c.f., Section 4.2) is low, then an eavesdropper can easily recover information about the input. In contrast, a higher empirical privacy (close to that of a most-frequent label baseline) suggests that mainly contains useful information for the main task, while other private information is erased.

Figure 3: Results of privacy protection over tp-us, ag and blog datasets across different differential privacy budgets. X-axis is the differential privacy budget , while Y-axis indicates the empirical privacy (see sec:prot).

To study the relationship between DP and empirical privacy, we numerically investigate the impact of the different differential privacy budgets on empirical privacy. Recall that the empirical privacy is measured by 1-, and the higher is better. fig:all_privacy shows that with the increase of the budget, empirical privacy across all datasets demonstrate a decreasing trend, especially for ag, which well aligns with DP where the higher value of implies lower formal privacy guarantee. Since provides the best privacy guarantee, we fix and as a default setting in the rest of this section, unless otherwise mentioned.

How private are the noisy neural representations?

For empirical privacy, we investigate whether our dpnr can provide better attack resistance compared with the adversarial learning (adv) coavoux2018privacy and non-private training method (non-priv), which indicates a lower bound. We also report the majority class prediction (majority) as an upper bound.

Table 4 shows that the attack model can indeed recover private information with reasonable accuracy when targeting towards the non-private representations, manifesting that representations inadvertently capture sensitive information about users, apart from the useful information for the main task. By contrast, our dpnr significantly reduces the amount of information encoded in the extracted representation, as validated by the substantially higher empirical privacy than non-priv across all datasets. We also observe that our dpnr achieves comparable empirical privacy to the majority class (majority), and consistently outperforms the adversarial learning (adv) from coavoux2018privacy, which confirms the argument of elazar2018adversarial that adversarial learning can not fully remove sensitive demographic traits from the data representations. Conversely, the post-processing property of DP ensures that the privacy loss of the extracted representation cannot be increased even by the most sophisticated attacker.

This claim can be further confirmed by Table 5, which reports the accuracy of the attacker on classifying whether a named entities is absent or presented in the document over ag333For space limitation, we only report 3 of 5 entities and the results of other two are similar.. Generally, both adv and dpnr can reduce attack accuracy, misleading the attacker classifier to predict most of the shared representations as majority (A). While our dpnr significantly outperforms both non-priv and adv, corroborating our analysis above.

tp-us ag blog
Main Priv. Main Priv. Main Priv.
majority 79.40 36.39 57.79 49.34 34.16 46.96
non-priv 85.53 34.71 78.75 23.24 97.07 33.88
adv -0.25 +0.67 -21.71 +26.43 -2.44 +1.16
dpnr +0.12 +3.66 +2.12 +31.13 -0.38 +15.86
Table 4: Results of the main task and the privacy-protected task on the test sets over different datasets. The relative values are based on non-priv method and bold indicates our dpnr achieves better performance than other methods. (See sec:prot for details for metrics.)
Entity 3 Entity 4 Entity 5
ratio [%] 82 18 90 10 91 9
non-priv 96.71 81.99   99.43 47.20   96.93 68.29
adv 98.57 39.71 100.00   0.00   99.87 12.96
dpnr 90.86   8.46 100.00   0.00 100.00   0.00
Table 5: Accuracy of attack classifier on absence (A) and presence (P) classification of 3 entities over ag.

4.6 Target Model Fairness

Recently, fairness concern has gained lots of attention in NLP community bolukbasi2016man; zhao2017men; chang2019bias; lu2018gender; sun2019mitigating. Depending on the literature, fairness can have different interpretation. In this section, we further consider the relation between differential privacy and fairness. We ask the research question whether differential privacy noise can help enhance model fairness? We focus on a particular scenario of fairness, that is given a specific demographic variable (e.g. gender) a fair model should deliver an equal or similar performance over the subgroups (e.g. male vs. female) rudinger-etal-2018-gender; zhao-etal-2018-gender.

To empirically evaluate the fairness, we take inspirations of rudinger-etal-2018-gender; zhao-etal-2018-gender; li2018towards and partition the test data into sub-groups by the demographic variables, i.e., age, gender and five person entities. Different from predicting demographic variables in attacker (sec:attack_model), we measure the main task accuracy difference among subgroups of demographic variables.

In fact, we noticed dpnr can also help mitigate the bias in the representations with respect to the specific demographic or identity attributes, such that the decisions made by our robust target model are able to improve the fairness among the concerned demographic groups.

Gender Age
tp-us ratio [%] 37 63 64 36
non-priv 83.69 +1.57 84.63 +0.02
Adv. 84.95 +0.19 85.38 -0.46
dpnr 85.90 +0.49 86.08 +0.31
blog ratio [%] 52 48 46 54
non-priv 98.07 -2.18 97.05 -1.49
Adv. 93.84 -7.54 91.91 -3.57
dpnr 98.00 -2.34 97.09 -0.11
Table 6: The accuracy of main tasks among different demographic groups (age and gender) on tp-us and blog. “Ratio” means the ratio between two subgroups of the demographic variable. The relative values (M and O) of right subgroups are deviated from the left subgroups (F and U) accordingly.
(a) non-priv
(b) dpnr
Figure 6: t-SNE plots of the extracted representations over two age subgroups (u20 and o30) of blog using non-priv method and proposed dpnr.

First of all, as the distribution of the demographic groups in tp-us and blog datasets is relatively even, hence there is no significant deviation on the main tasks (see Table 6). However, we still observe an noticeable difference for the age group in blog and the gender attribute in tp-us. To help better understand the phenomenon, we perform further analysis by plotting the non-private and differentially private representations of age on blog in fig:tsne. It can be clearly observed that the patterns of two subgroups are much easier to be distinguished in the non-private representations, while the differentially private representations mostly mix the representations of “under 20” and “over 30”. We speculate that this is a consequence of the regularising effect of DP.

Entity 3 Entity 4 Entity 5
ratio [%] 82 18 90 10 91 9
non-priv 82.67 -18.82 80.86 -15.44 80.22 -10.78
Adv. 48.92 +7.82 49.64 +6.67 49.43 +9.6
dpnr 83.60 -16.93 81.79 -12.22 81.04 -6.04
Table 7: The accuracy of main tasks among three name entities on ag. A means when the entity is absence, while P indicates the presence of that entity. The relative values are deviated from the subgroup A. bold means a statistically significant (p<0.0001) fairness improvement.

Table 7 shows the fairness results on ag

, where we observe the entity distributions are skewed and the prediction of the

non-priv model on the dominant groups is significantly superior to the minority groups, which causes a severe violation in terms of the fairness. Even under such circumstance, our dpnr method can mitigate this skewed bias, achieving more fair prediction than other baselines.

5 Discussion

Privacy and fairness are two emerging but important areas in NLP community. Prior efforts predominantly focus on either privacy or fairness li2018towards; coavoux2018privacy; rudinger-etal-2018-gender; zhao-etal-2018-gender; lyu2020towards, but there is no systematic study on how privacy and fairness are related. This work fills this gap, and discovers the impact of differential privacy on model fairness. We empirically show that privacy and fairness can be simultaneously achieved through differential privacy.

We hope that this work highlights the need for more research in the development of effective countermeasures to defend against privacy leakage via model representation and mitigate model bias in a general sense, and not only specific to a particular attack. More generally, we hope that our work spurs future interest into developing a better understanding of why differential privacy works.

Meanwhile, differential privacy may incur a reduction in the model’s accuracy. It is worthwhile to explore how to get a better trade-off between privacy, fairness and accuracy.

6 Conclusion and Future Work

In this paper, we take the first effort to build differential privacy into the extracted neural representation of text during inference phase. In particular, we prove that masking the words in a sentence via dropout can further enhance privacy. To maintain utility, we propose a novel robust training algorithm that incorporates a noisy layer into the training process to produce the noisy training representation. Experimental results on benchmark datasets across various tasks, and parameter settings demonstrate that our approach ensures representation privacy without significantly degrading accuracy. Meanwhile, our DP method helps reduce the effects of model discrimination in most cases, achieving better fairness than the non-private baseline. Our work makes a first step towards understanding the connection between privacy and fairness in NLP – which were previously thought of as distinct classes. Moving forward, we believe that our results justify a larger study on various NLP applications and models, which will be our immediate future work.


We would like to thank the anonymous reviewers for their valuable feedback.