We study the basic operation of set union in the global model of differential privacy. In this problem, we are given a universe U of items, possibly of infinite size, and a database D of users. Each user i contributes a subset W_i ⊆ U of items. We want an (ϵ,δ)-differentially private algorithm which outputs a subset S ⊂∪_i W_i such that the size of S is as large as possible. The problem arises in countless real world applications; it is particularly ubiquitous in natural language processing (NLP) applications as vocabulary extraction. For example, discovering words, sentences, n-grams etc., from private text data belonging to users is an instance of the set union problem. Known algorithms for this problem proceed by collecting a subset of items from each user, taking the union of such subsets, and disclosing the items whose noisy counts fall above a certain threshold. Crucially, in the above process, the contribution of each individual user is always independent of the items held by other users, resulting in a wasteful aggregation process, where some item counts happen to be way above the threshold. We deviate from the above paradigm by allowing users to contribute their items in a dependent fashion, guided by a policy. In this new setting ensuring privacy is significantly delicate. We prove that any policy which has certain contractive properties would result in a differentially private algorithm. We design two new algorithms, one using Laplace noise and other Gaussian noise, as specific instances of policies satisfying the contractive properties. Our experiments show that the new algorithms significantly outperform previously known mechanisms for the problem.READ FULL TEXT VIEW PDF
Natural language models for applications such as suggested replies for e-mails and dialog systems rely on the discovery of -grams and sentences Hu et al. (2014); Kannan et al. (2016); Chen et al. (2019); Deb et al. (2019). Words and phrases used for training come from individuals, who may be left vulnerable if personal information is revealed. For example, a model could generate a sentence or predict a word that can potentially reveal personal information of the users in the training set Carlini et al. (2019). Therefore, algorithms that allow the public release of the words, -grams, and sentences obtained from users’ text while preserving privacy are desirable. Additional applications of this problem include the release of search queries and keys in SQL queries Korolova et al. (2009); Wilson et al. (2020). While other privacy definitions are common in practice, guaranteeing differential privacy, introduced in the seminal work of Dwork et al Dwork et al. (2006), ensures users the strongest preservation of privacy. In this paper we consider user level privacy.
A randomized algorithm is (,)-differentially private if for any two neighboring databases and , which differ in exactly the data pertaining to a single user, and for all sets of possible outputs:
An algorithm satisfying differential privacy (DP) guarantees that its output does not change by much if a single user is either added or removed from the dataset. Moreover, the guarantee holds regardless of how the output of the algorithm is used downstream. Therefore, items (e.g. n-grams) produced using a DP algorithm can be used in other applications without any privacy concerns. Since its introduction a decade ago Dwork et al. (2006)
, differential privacy has become the de facto notion of privacy in statistical analysis and machine learning, with a vast body of research work (see Dwork and RothDwork et al. (2014) and Vadhan Vadhan (2017) for surveys) and growing acceptance in industry. Differential privacy is deployed in many industries, including Apple Apple (2017), Google Erlingsson et al. (2014); Bittau et al. (2017), Microsoft Ding et al. (2017), Mozilla Avent et al. (2017), and the US Census Bureau Abowd (2016); Kuo et al. (2018).
The vocabulary extraction and -gram discovery problems mentioned above, as well as many commonly studied problems Korolova et al. (2009); Wilson et al. (2020), can be abstracted as a set union which leads to the following problem.
Let be some universe of items, possibly of unbounded size. Suppose we are given a database of users where each user has a subset . We want an (,)-differentially private Algorithm which outputs a subset such that the size of is as large as possible.
Since the universe of items can be unbounded, as in our motivating examples, it is not clear how to apply the exponential mechanism McSherry and Talwar (2007) to DPSU. Furthermore, even for the cases when is bounded, implementing the exponential mechanism can be also very inefficient. Existing algorithms 222They don’t study the DPSU problem as defined in this paper. Their goal is to output approximate counts of as many items as possible in for this problem Korolova et al. (2009); Wilson et al. (2020) collect a bounded number of items from each user, build a histogram of these items, and disclose the items whose noisy counts fall above a certain threshold. In these algorithms, the contribution of each user is always independent from the identity of items held by other users, resulting in a wasteful aggregation process, where some items’ counts could be far above the threshold. Since the goal is to release as large a set as possible rather than to release accurate counts of each item, there could be more efficient ways to allocate the weight to users’ items.
We deviate from the previous methods by allowing users to contribute their items in a dependent fashion, guided by an update policy. In our algorithms, proving privacy is more delicate as some update policies can result in histograms with unbounded sensitivity. We prove a meta-theorem to show that update policies with certain contractive properties would result in differentially private algorithms. The main contributions of the paper are:
Guided by our meta-theorems, we introduce two new algorithms called Policy Laplace and Policy Gaussian for the DPSU problem. Both of them run in linear time and only require a single pass over the users’ data.
Using a Reddit dataset, we demonstrate that our algorithms significantly improve the size of DP set union even when compared to natural generalizations of the existing mechanisms for this problem (see Figure 1).
Our algorithms are being productized in industry to make a basic subroutine in an NLP application differentially private.
To understand the DPSU problem better, let us start with the simplest case we can solve by known techniques. Define . Suppose . This special case can be solved using the algorithms in Korolova et al. (2009); Wilson et al. (2020). Their algorithm works as follows: Construct a histogram on (the set of items in a database ) where the count of each item is the number of sets it belongs to. Then add Laplace noise or Gaussian noise to the counts of each item. Finally, release only those items whose noisy histogram counts are above a certain threshold . It is not hard to prove that if the threshold is set sufficiently high, then the algorithm is -DP.
A straight-forward extension of the histogram algorithm for is to upper bound the -sensitivity by (and -sensitivity by ), and then add some appropriate amount of Laplace noise (or Gaussian noise) based on sensitivity. The threshold has to be set based on The Laplace noise based algorithm was also the approach considered in Korolova et al. (2009); Wilson et al. (2020). This approach has the following drawback. Suppose a significant fraction of users have sets of size smaller than . Then constructing a histogram based on counts of the items results in wastage of sensitivity budget. A user with can increment the count of items in
by any vectoras long as one can ensure that sensitivity is bounded by (or sensitivity is bounded by if adding Gaussian noise). Consider the following natural generalization of Laplace and Gaussian mechanisms to create a weighted histogram of elements. A weighted histogram over a domain is any map . For an item is called the weight of In the rest of the paper, the term histogram should be interpreted as weighted histogram. Each user updates the weight of each item using the rule: for . It is not hard to see that -sensitivity of this weighted histogram is still . Adding Laplace noise (for ) or Gaussian noise (for ) to each item of the weighted histogram, and releasing only those items above an appropriately calibrated threshold will lead to differentially private output. We call these algorithms as Weighted Laplace and Weighted Gaussian, they will be used as benchmarks to compare against our new algorithms.
The Weighted Laplace and Weighted Gaussian mechanisms described above can be thought of trying to solve the following variant of a Knapsack problem. Here each item is a bin and we gain a profit of 1 if the total weight of the item in the weighted histogram constructed is more than the threshold. Each user can increment the weight of elements using an update policy which is defined as follows.
An update policy is a map such that , i.e., can only update the weights of items in . And the user updates to Since is typically understood from context, we will write instead of for simplicity.
In this framework, the main technical challenge is the following:
How to design update policies such that the sensitivity of the resulting weighted histogram is small while maximizing the number of bins that are full?
Note that bounding sensitivity requires that for some constant i.e. each user has an -budget of and can increase the weights of items in their set by an -distance of at most . By scaling, WLOG we can assume that Note that having a larger value of should help in filling more bins as users have more choice in how they can use their budget to increment the weight of items.
In this paper, we consider algorithms which iteratively construct the weighted histogram. That is, in our algorithms, we consider users in a random order, and each user updates the weighted histogram using the update policy Algorithm 1 is a meta-algorithm for DP set union, and all our subsequent algorithms follow this framework.
If the update policy is such that it increments the weights of items independent of other users (as done in Weighted Laplace and Weighted Gaussian), then it is not hard to see that sensitivity of can be bounded by ; that is, by the budget of each user. However, if some item is already way above the threshold then it does not make much sense to waste the limited budget on that item. Ideally, users can choose a clever update policy to distribute their budget among the items based on the current weights.
Note that if a policy is such that updates of a user depends on other users, it can be quite tricky to bound the sensitivity of the resulting weighted histogram. To illustrate this, consider for example the greedy update policy. Each user can use his budget of 1 to fill the bins that is closest to the threshold among the bins . If an item already reached the threshold, the user can spend his remaining budget incrementing the weight of next bin that is closest to the threshold and so on. Note that from our Knapsack problem analogy this seems be a good way to maximize the number of bins filled. However such a greedy policy can have very large sensitivity (see appendix for an example), and hence won’t lead to any reasonable DP algorithm. So, the main contribution of the paper is in showing policies which help maximize the number of item bins that are filled while keeping the sensitivity low. In particular, we define a general class of -contractive update policies and show that they produce weighted histograms with bounded -sensitivity.
We say that an update policy is -contractive if there exists a subset (called the invariant subset for ) of pairs of weighted histograms which are at an distance of at most 1, i.e.,
such that the following conditions hold.
(Invariance) .333Note that property (1) is a slightly weaker requirement than the usual notion of -contractivity which requires for all Instead we require contraction only for
for all .
Property (2) of Definition 1.3 requires that the update policy can change the histogram by an distance of at most 1 (budget of a user).
Suppose is an update policy which is -contractive over some invariant subset . Then the histogram output by Algorithm 2 has -sensitivity bounded by 1.
The main contribution of the paper is two new algorithms guided by Theorem 1.1. The first algorithm, which we call Policy Laplace, uses update policy that is -contractive. The second algorithm, which we call Policy Gaussian, uses update policy that is -contractive. Finally we show that our algorithms significantly outperform the weighted update policies.
At a very high-level, the role of contractivity in our algorithms is indeed similar to its role in the recent elegant work of Feldman et al Feldman et al. (2018). They show that if an iterative algorithm is contractive in each step, then adding Gaussian noise in each iteration will lead to strong privacy amplification. In particular, users who make updates early on will enjoy much better privacy guarantees. However their framework is not applicable in our setting, because their algorithm requires adding noise to the count of every item in every iteration; this will lead to unbounded growth of counts and items which belong to only a single user can also get output which violates privacy.
Let denote the collection of all databases. We say that are neighboring databases, denoted by , if they differ in exactly one user.
For the -sensitivity of is defined as where the supremum is over all neighboring databases .
Given any function , the Laplace Mechanism is defined as:
where is the -sensitivity and
are i.i.d. random variables drawn from.
If is a function with -sensitivity . For any and , the Gaussian output perturbation mechanism with is -DP if and only if
We say that two distributions on a domain are -close to each other, denoted by , if for every , we have
We say that two random variables are -close to each other, denoted by , if their distributions are -close to each other.
We will need the following lemmas which are useful to prove -DP.
be probability distributions over a domain. If there exists an event s.t. and , then .
Fix some subset .
We now prove the other direction.
Now if , then we have Otherwise, trivially
We will also need the fact that if , then after post-processing they also remain -close.
If two random variables are -close and is any randomized algorithm, then .
Let for some function where is the random bits used by . For any subset of the possible outputs of ,
The other direction holds by symmetry. ∎
In this section, we show that if an update policy satisfies contractive property as in Definition 1.3, we can use it to develop a DPSU algorithm. First we show that contractivity of update policy implies bounded sensitivity (Theorem 1.1), which in turn implies a DPSU algorithm by Theorem 1.2. We will first define sensitivity and update policy formally. Let denote the collection of all databases. We say that are neighboring databases, denoted by , if they differ in exactly one user.
For the -sensitivity of is defined as where the supremum is over all neighboring databases .
Let be an -contractive update policy with invariant subset Consider two neighboring databases and where has one extra user compared to . Let and denote the histograms built by Algorithm 1 using the update policy when the databases are and respectively.
Say the extra user in has position in the global ordering given by the hash function. Let and be the histograms after the first (according to the global order given by the hash function hash) users’ data is added to the histogram. Therefore And the new user updates to . By property (2) in Definition 1.3 of -contractive policy, . Since , we have The remaining users are now added to in the same order. Note that we are using the fact that the users are sorted according some hash function and they contribute in that order (this is also needed to claim that ). Therefore, by property (1) in Definition 1.3 of -contractive policy, we get . Since only contains pairs with -distance at most 1, we have . Therefore the histogram built by Algorithm 2 using has -sensitivity of at most 1. ∎
Above theorem implies that once we have a contractive update policy, we can appeal to Theorem 1.2 to design an algorithm for DPSU.
The policy is described in Algorithm 3. We will set some cutoff above the threshold to use in the update policy. Once the weight of an item () crosses the cutoff, we do not want to increase it further. In this policy, each user starts with a budget of 1. The user uniformly increases for each s.t. . Once some item’s weight reaches the user stops increasing that item and keeps increasing the rest of the items until the budget of 1 is expended. To implement this efficiently, the items from each user are sorted based on distance to the cutoff. Beginning with the item whose weight is closest to the cutoff (but still below the cutoff), say item , we will add (gap to cutoff for item ) to each of the items below the cutoff. This repeats until the user’s budget of 1 has been expended.
This policy can also be interpreted as gradient descent to minimize the -distance between the current weighted histogram and the point , hence the name -descent. Since the gradient vector is 1 in coordinates where the weight is below cutoff and in coordinates where the weight is the -descent policy is moving in the direction of the gradient until it has moved a total -distance of at most 1.
The Policy Laplace algorithm (Algorithm 4) for DPSU uses the framework of the meta algorithm in Algorithm 1 with the update policy in Algorithm 3. Since the added noise is , which is centered at 0, we want to set the cutoff in the update policy to be sufficiently above the threshold . Thus we pick for some . From our experiments, choosing works best empirically. The parameters are set so as to achieve -DP as shown in Theorem 4.1.
In this section, we will prove that the Policy Laplace algorithm (Algorithm 4) is -DP. By Theorem 1.2 and Theorem 1.1, it is enough to show that -descent policy (Algorithm 3) is -contractive. For two histograms , we write if for each every item . is defined similarly.
Let . Then -descent update policy is -contractive over the invariant subset
Let denote the -descent update policy.
We will first show property (2) of Definition 1.3. Let be any weighted histogram and let . Clearly as the new user will never decrease the weight of any item. Moreover, the total change to the histogram is at most in -distance. Therefore Therefore
We will now prove property (1) of Definition 1.3. Let , i.e., and . Let A new user can increase and by at most 1 in distance. Let be the cutoff parameter in Algorithm 3. Let be the set of items with the new user, therefore only the items in will change in . WLOG, we can assume that the user changes both and by exactly total distance of 1. Otherwise, in at least one of them all the items in should reach the cutoff . If this happens with then clearly for all . But it is easy to see that if this happens with , then it should also happen with in which case for
Imagine that at time , the user starts pushing mass continuously at a rate of 1 to both until the entire mass of is sent, which happens at time . The mass flow is equally split among all the items which haven’t yet crossed cutoff. Let and be the histograms at time . Therefore and . We claim that implies that for all s.t. . This is because the flow is split equally among items which didn’t cross the cutoff, and there can only be more items in which didn’t cross the the cutoff when compared to . And at time , we have . Therefore, we have for all and so
We will now prove -contraction. Let . By the discussion above, (either total mass flow is equal to 1 for both or all items in will reach cutoff in before this happens in ).
Therefore which proves property (2) of Definition 1.3. ∎
Suppose and are neighboring databases where has one extra user compared to . Let and denote the histograms built by the Policy Laplace algorithm (Algorithm 4) when the database is and respectively. Then
Say the extra user in has position in the global ordering given by the hash function. Let and be the histograms after the first (according to the global order) users data is added to the histogram. Therefore And the new user updates to . Since the total change by an user is at most , The remaining users are now added to in the same order. Note that we are using the fact that the users are sorted according some hash function and they contribute in that order (this is also needed to claim that ). Therefore, by the -contraction property shown in Lemma 4.1, ∎
We now state a formal theorem which proves of Policy Laplace algorithm.
The Policy Laplace algorithm (Algorithm 4) is - when
Suppose and are neighboring databases where has one extra user compared to . Let and denote the distribution of output of the algorithm when the database is and respectively. We want to show that . Let be the event that
Let and be the histograms generated by the algorithm from databases and respectively. And and be the histograms obtained by adding noise to each entry of and respectively. For any possible output of Algorithm 4, we have
So is obtained by post-processing and is obtained by post-processing . Since post-processing only makes two distributions closer (Lemma 2.2), it is enough to show that the distributions of the and are -close to each other. By Lemma 4.2, and differ in -distance by at most 1. Therefore by the properties of Laplace mechanism (see Theorem 3.6 in Dwork and Roth (2014)). ∎
By Lemma 2.1, it is enough to show that . Let Note that and for
we have . Therefore the Policy Laplace algorithm (Algorithm 4) is -DP. ∎
Similar to the Laplace update policy, we will set some cutoff above the threshold and once an item’s count () crosses the cutoff, we don’t want to increase it further. In this policy, each user starts with a budget of 1. But now, the total change a user can make to the histogram can be at most when measured in -norm (whereas in Laplace update policy we used -norm to measure change). In other words, sum of the squares of the changes that the user makes is at most 1. Since we want to get as close to the cutoff () as possible, the user moves the counts vector (restricted to the set of items the user has) in the direction of the point by an -distance of at most 1. This update policy is presented in Algorithm 5.
This policy can also be interpreted as gradient descent to minimize the -distance between the current weighted histogram and the point , hence the name -descent. Since the gradient vector is in the direction of the line joining the current point and , the -descent policy is moving the current histogram towards by an -distance of at most 1.
The Policy Gaussian algorithm (Algorithm 6) for DPSU uses the framework of the meta algorithm in Algorithm 1 using the Gaussian update policy (Algorithm 5). Since the added noise is which is centered at 0, we want to set the cutoff in the update policy to be sufficiently above (but not too high above) the threshold . Thus we pick for some . From our experiments, choosing empirically yields these best results. The parameters are set so as to achieve -DP as shown in Theorem 5.1.
is the cumulative density function of standard Gaussian distribution andis its inverse.
To find , one can use binary search because is a decreasing function of An efficient and robust implementation of this binary search can be found in Balle and Wang (2018).
In this section we will prove that the Policy Gaussian algorithm (Algorithm 6) is -DP. By Theorem 1.2 and Theorem 1.1, it is enough to show -contractivity of -descent update policy. We will need a simple plane geometry lemma for this.
Let denote the vertices of a triangle in the Euclidean plane. If let be the point on the side which is at a distance of from and if define . is defined similarly. Then
Let us first assume that both Let be the angle at and let as shown in Figure 2. Then by the cosine formula,
If , then and then the claim is trivially true. Suppose . Now Let and be the angle at as shown in Figure 3. Then by the cosine formula,
By symmetry, the claim is also true when . ∎
Let . Then the -descent update policy is -contractive over the invariant set