1 Introduction
Top is the problem of identifying the ordered sequence of items with the highest counts from a data domain of
items. This basic problem arises in machine learning tasks such as recommender systems, basket market analysis, and language learning. To solve these problems while guaranteeing privacy to the individuals contributing data, several works have studied top
under the additional constraint of differential privacy (DP) [12]. Differential privacy guarantees that publishing the identified elements reveals only a controlled amount of information about the users who contributed data to the item counts.The best known DP algorithm for top is the “peeling” mechanism [3, 11], which applies a DP subroutine for selecting the highest count item from a set, removes it, and repeats times. One possible such subroutine is the exponential mechanism, a general DP algorithm for choosing highutility (here, highcount) elements from a data universe given some utility function (see Section 2). Another similar subroutine is the permuteandflip mechanism [25]. This leads to the baseline peeling mechanisms in our experiments, each using the best known composition methods: the approximate DP baseline uses the exponential mechanism analyzed via concentrated differential privacy (CDP) composition [14, 6], and the pure DP basline uses the permuteandflip mechanism with basic composition. The approximate DP variant takes time and the pure variant takes time . Both require space .
1.1 Our Contributions
We construct an instance of the exponential mechanism that chooses directly from sequences of
items. Unlike the peeling mechanism, this approach does not use composition. Past work has used this style of “joint” exponential mechanism to privately and efficiently estimate: 1way marginals under
error [30], password frequency lists [5], and quantiles
[16]. However, it is not obvious how to extend any of these to top selection.Naive implementation of a joint exponential mechanism, whose output space is all sequences of items, requires enumerating all such sequences. This is impractical even for modest values of and . As with previous work on joint exponential mechanisms, our main contribution is an equivalent efficient sampling method.
Theorem 1.1 (Informal version of Theorem 3.9).
There is a joint exponential mechanism for DP top that takes time and space .
While it is straightforward to prove a utility guarantee for this mechanism (Theorem 3.3), our main argument for this joint approach is empirical, as asymptotic guarantees often obscure markedly different performance in practice. Experiments show that the joint exponential mechanism offers the strongest performance among pure differential privacy mechanisms and even outperforms approximate differential privacy mechanisms when is not large (Section 4).
1.2 Related Work
Private top was first studied in the context of frequent itemset mining, where each user contributes a set of items, and the goal is to find common subsets [3, 23, 33, 22]. Bhaskar et al. [3] introduced the first version of the peeling mechanism. Our main points of comparison will be variants of the peeling mechanism developed by Durfee and Rogers [11] and McKenna and Sheldon [25].
Past work has also studied several other algorithms for DP top. Laplace noise has been used for pure and approximate DP [10, 28]. Additionally, for pure DP, the Gamma mechanism for releasing private counts (Theorem 4.1, [30]) can be applied. Our experiments found that peeling mechanisms dominate these approaches, so we omit them, but implementations of all three appear in our public code respository [17]. Finally, we note that Durfee and Rogers [11] study the problem of private top when the number of items is prohibitively large and runtime is impractical. We instead focus on the setting where runtime is acceptable.
A few lower bounds are relevant to private top. One naive approach is to simply privately estimate all item counts and return the top from those noisy estimates. However, each count has expected error for pure DP (Theorem 1.1, [18]). Similarly, Bun et al. [7] (Corollary 3.4) construct a distribution over databases such that, over the randomness of the database and the mechanism, each count has expected error for approximate DP. Top approaches aim to replace this dependence on with a dependence on . Bafna and Ullman [2] and Steinke and Ullman [31] prove lower bounds for what we call relative error, the maximum amount by which the true count exceeds any of the estimated top counts (see Section 2 for details). Respectively, they prove and sample complexity lower bounds for “small” and “large” relative error. In both cases, the peeling mechanism provides a tight upper bound. We upper bound signed maximum error (Theorem 3.3), but our paper more generally departs from these works by focusing on empirical performance.
2 Preliminaries
Notation.
. Given a vector
in dimension and indices , denotes coordinates of ; given a sequence , denotes coordinates for .We start by formally describing the top problem.
Definition 2.1.
Let be a data domain of items. In an instance of the top problem, there are users and each user has an associated vector . For dataset , let denote the count of item , and let denote the counts in nonincreasing order. Given sequence of indices , let
denote the corresponding sequence of counts. Given sequence loss function
, the goal is to output a sequence of items .Note that each user contributes at most one to each item count but may contribute to arbitrarily many items. This captures many natural settings. For example, a user is unlikely to review the same movie more than once, but they are likely to review multiple movies. In general, we describe dataset by the vector of counts for its domain, .
Our experiments will use , , and, in keeping with past work on private top [2, 11], what we call relative error.
Definition 2.2.
Using the notation from Definition 2.1, we consider sequence error functions

,

, and

relative error .
The specific choice of error may be tailored to the data analyst’s goals: error suits an analyst who wishes to minimize the worst error of any of the top counts; error is appropriate for an analyst who views a sequence of slightly inaccurate counts as equivalent to one highly inaccurate count; and relative error may be best when the analyst prioritizes a “sound” sequence where no count is much lower than the true count. Note that, while relative error has been featured in past theoretical results on private top selection, it is the most lenient error metric. For example, given items with counts and , any sequence of items obtains optimal relative error, and in general .
Next, we cover privacy prerequisites. Differential privacy guarantees that adding or removing a single input data point can only change an algorithm’s output distribution by a carefully controlled amount.
Definition 2.3 (Dwork et al. [12]).
Datasets are neighbors (denoted ) if can be obtained from by adding or removing a data point . Mechanism is differentially private (DP) if, for any two neighboring datasets in , and any , it holds that . If , it is DP.
One especially flexible differentially private algorithm is the exponential mechanism. Given some utility function over outputs, the exponential mechanism samples highutility outputs with higher probability than lowutility outputs.
Definition 2.4 (McSherry and Talwar [26], Dwork and Roth [13]).
Given utility function with sensitivity , the exponential mechanism has output distribution
where elides the normalization factor.
Note that this distribution places relatively more mass on outputs with higher scores when the sensitivity is small and the privacy parameter is large.
Lemma 2.5 (McSherry and Talwar [26]).
The exponential mechanism is DP.
A tighter analysis is possible for certain utility functions.
Definition 2.6.
A utility function is monotonic (in the dataset) if, for every dataset and output , for any neighboring datasets that results from adding some data point to , .
When the utility function is monotonic, the factor of 2 in the exponential mechanism’s output distribution can be removed. This is because the factor of 2 is only necessary when scores for different outputs move in opposite directions between neighboring datasets.
Lemma 2.7 (McSherry and Talwar [26]).
Given monotonic utility function , the exponential mechanism is DP.
One cost of the exponential mechanism’s generality is that its definition provides no guidance for efficiently executing the sampling step. As subsequent sections demonstrate, this is sometimes the main technical hurdle to applying it.
3 A Joint Exponential Mechanism for Top
Our application of the exponential mechanism employs a utility function measuring the largest difference in counts between the true counts and candidate counts. Let be the item counts in nonincreasing order. For candidate sequence of items , we define
thus assigns the highest possible score, 0, to the true sequence of top counts, and increasingly negative scores to sequences with smaller counts. Sequences with repeated items have score and are never output.
Discussion of .
A natural alternative to would replace with . Call this alternative . In addition to being expressible as a simple norm, also corresponds exactly to the number of user additions or removals sufficient to make the true top sequence^{1}^{1}1The idea of a utility function based on dataset distances has appeared in the DP literature under several names [20, 1, 27] but has not been applied to top selection.. However, has two key advantages over . First, admits an efficient sampling mechanism. Second, favors sequences that omit highcount items entirely over sequences that include them in the wrong order. For example, suppose we have a dataset consisting of items with counts . If we want the top items, we will consider sequences such as and . These have identical value according to : . But according to , scores much worse than : . This conflicts with the ultimate goal of identifying the highestcount items; contains item 2 (count ), while replaces it with item 6 (count )^{2}^{2}2The standard loss metric shares this flaw; may therefore be a reasonable loss metric for future top work. Nonetheless, past work uses error, and we did not observe large differences between the two empirically, so we use in our experiments as well.. We now show that also has low sensitivity.
Lemma 3.1.
.
Proof.
First, any sequence with utility has that utility on every dataset. Turning to sequences of distinct elements , adding a user does not decrease any count, and increases a count by at most one. Furthermore, while the top items may change, none of the top counts decrease, and each increases by at most one. It follows that each either stays the same, decreases by one, or increases by one. A similar analysis holds when a user is removed. ∎
We call the instance of the exponential mechanism with utility Joint. Its privacy is immediate from Lemma 2.5.
Theorem 3.2.
Joint is DP.
A utility guarantee for Joint is also immediate from the generic utility guarantee for the exponential mechanism. The (short) proof appears in Appendix A.
Theorem 3.3.
Let denote the true top counts for dataset , and let denote those output by Joint. With probability at least ,
Naive sampling of Joint requires computing output probabilities. The next subsection describes a sampling algorithm that only takes time .
3.1 Efficiently Sampling Joint
The key observation is that, while there are possible output sequences, a given instance has only possible values. This is because each score takes the form for some and . Our algorithm will therefore proceed as follows:

For each of the utilities , count the number of sequences with score .

Sample a utility from the distribution defined by
(1) 
From the space of all sequences that have the selected utility , sample a sequence uniformly at random.
This outline makes one oversimplification: instead of counting the number of sequences for each of (possibly nondistinct) integral utility values, the actual sampling algorithm will instead work with exactly distinct nonintegral utility values. Nonetheless, the output distribution will be exactly that of the exponential mechanism described at the beginning of the section.
3.1.1 Counting the Number of Sequences
Define matrix by where is a small term in that ensures distinctness,
Several useful properties of are stated in Lemma 3.4.
Lemma 3.4.
Given defined above, 1) each row of is decreasing, 2) each column of is increasing, and 3) the elements of are distinct.
Proof.
Fix some row . By definition, , so . The terms also increase with , so each row of is decreasing. By similar logic, each column of is increasing.
Finally, note that since and , the terms are
and thus are distinct values in . Since any two count differences and are either identical or at least 1 apart, claim 3) follows. ∎
We now count “sequences through ”. Each sequence consists of values from , one from each row of , and its score is , or if the values are not distinct. For each and , define to be the number of sequences through with distinct elements and score exactly . and are useful because of the following connection to , the quantities necessary to sample from the distribution in Equation 1:
Lemma 3.5.
For any and , let . Then .
Proof.
Each , so is exactly the collection of where . ∎
The problem thus reduces to computing the values. For each row and utility , define
Useful simple properties of these values appear below.
Lemma 3.6.
Fix some . Then 1) is nondecreasing in , 2) if sequence has score , then for all , and 3) there exists a sequence of distinct elements with score if and only if for all .
Proof.
The first two properties follow directly from Lemma 3.4. For property 3) let satisfy the conditions of the lemma and assume for some . By properties 1) and 2), for all , . But that implies contains distinct numbers less than , which is a contradiction. In the other direction, suppose for all . Define where for and . Since , . Therefore contains at least one option for , contains at least one option for , and so on. The resulting has distinct elements and score . ∎
The following lemma connects the and values.
Lemma 3.7.
Given entry of , define . Then .
Proof.
By Lemma 3.6 we know the statement is true for such that . If then any sequence with score consists of distinct elements such that and for all , (otherwise has score less than ). The number of such sequences is . ∎
A naive solution thus computes all of the values, then uses them to compute the values. We can avoid this by observing that, if we sort the values of , then adjacent values in the sorted order have almost identical .
Lemma 3.8.
Let denote the entries of sorted in decreasing order. For each , let denote its row index in . Then: 1) for each , , and 2) for , .
Proof.
For 1), assume . Then we can define such that
which implies that . This contradicts and being adjacent in sorted order. For 2), assume there exists such that . (If we instead assumed , this would contradict the sorting order of and the definition of .) This implies that , which again contradicts and being adjacent in sorted order. ∎
According to the above lemma, we can compute all of the as follows. First, sort the entries of , recording the row and column indices of each as and . Then, create a vector storing the values for . These can be combined into using Lemma 3.7. If is nonzero, then we can get simply by rescaling; according to Lemma 3.8, adding one to entry gives the new vector of ’s, so only one term in the formula for changes in going from to . We can thus compute each in constant time, and compute all values in time .
3.1.2 Sampling a Utility
Given the values above, we sample from a slightly different distribution than the one defined in Equation 1. The new distribution is, for ,
(2) 
When the are large, sampling can be done in a numerically stable manner using logarithmic quantities (see, e.g., Appendix A.6 of Medina and Gillenwater [27]).
3.1.3 Sampling a Sequence
After sampling from Equation 2, we sample a sequence of item indices uniformly at random from the collection of sequences with score . The sample fixes . To sample the remaining items, we sample uniformly at random from , from , and so on. Lemma 3.6 guarantees that this process never attempts to sample from an empty set.
3.1.4 Overall Algorithm
Joint’s overall guarantees and pseudocode appear below.
Theorem 3.9.
Joint samples a sequence from the exponential mechanism with utility in time and space .
Proof.
We sketch here, deferring details to Appendix B.
Privacy: By Lemma 3.5, to get it suffices to compute ^{3}^{3}3Note that even if items and have identical counts, they may have differing sequence counts, . The sampling in the loop on Line 22 implicitly makes up for this difference. See the full privacy proof in Appendix B., and by Lemma 3.7 it suffices to compute the values. A score sampled from Equation 2 may be nonintegral; taking its ceiling produces a utility , with the desired distribution from Equation 1.
Runtime and space: Referring to Algorithm 1, line 2 takes time and space . Line 3 takes time and space . Line 4 takes time and space ; since each row of is already decreasing, we can use way merging [21] instead of naive sorting. All remaining lines require time and space. ∎
Joint has the same guarantees (Theorem 3.2, Theorem 3.3) as the exponential mechanism described at the beginning of this section, since its output distribution is identical.
4 Experiments
Our experiments compare the peeling and joint mechanisms across several realworld datasets using the error metrics from Definition 2.2. All datasets are public, and all experiment code is available on Github [17]. As described in Section 1.2, we only present the best pure and approximate DP baselines. Other methods are available in the experiment code. For completeness, example error plots featuring all methods appear in Section C.3.
4.1 Comparison Methods
4.1.1 Pure DP Peeling Mechanism
We start with the pure DP variant, denoted PNFPeel. It uses DP applications of the permuteandflip mechanism, which dominates the exponential mechanism under basic composition (Theorem 2 [25]). We use the equivalent exponential noise formulation [9]
, where the exponential distribution
is defined over by(3) 
Its pseudocode appears in Algorithm 2. We omit the factor of 2 in the exponential distribution scale because the count utility function is monotonic (see Definition 2.6 and Remark 1 of McKenna and Sheldon [25]).
Lemma 4.1.
PNFPeel is DP.
4.1.2 Approximate DP Peeling Mechanism
The approximate DP variant instead uses DP applications of the exponential mechanism. We do this because the exponential mechanism admits a CDP analysis that takes advantage of its boundedrange property for stronger composition; a similar analysis for permuteandflip is not known.
We use the Gumbelnoise variant of the peeling mechanism [11]. This adds Gumbel noise to each raw count and outputs the sequence of item indices with the highest noisy counts. The Gumbel distribution is defined over by
(4) 
and the resulting pseudocode appears in Algorithm 3.
By Lemma 4.2 in Durfee and Rogers [11], CDPPeel has the same output distribution as repeatedly applying the exponential mechanism and is DP. A tighter analysis is possible using CDP. While an DP algorithm is always CDP, an DP invocation of the exponential mechanism satisfies a stronger CDP guarantee (Lemmas 3.2 and 3.4 [8]). Combining this with a generic conversion from CDP to approximate DP (Proposition 1.3 [6]) yields the following privacy guarantee:
Lemma 4.2.
CDPPeel is DP for any and
All of our approximate DP guarantees for CDPPeel use Lemma 4.2.
4.2 Datasets
We use six datasets: Books [29] (11,000+ Goodreads books with review counts), Foods [24] (166,000+ Amazon foods with review counts), Games [32] (5,000+ Steam games with purchase counts), Movies [19] (62,000+ Movies with rating counts), News [15] (40,000+ Mashable articles with share counts), and Tweets [4] (52,000+ Tweets with like counts). For each dataset, it is reasonable to assume that one person contributes to each count, but may also contribute to many counts. Histograms of item counts appear in Section C.1. A more relevant quantity here is the gaps between counts of the top items (Figure 1, leftmost column). As we’ll see, Joint performs best on datasets where gaps are relatively large (Books, Movies, News, and Tweets).
4.3 Results
The experiments evaluate error across the three mechanisms, six datasets, and three error metrics. For each mechanism, the center line plots the median error from 50 trials (padded by 1 to avoid discontinuities on the logarithmic
axis), and the shaded region spans the to percentiles. We use with 1DP instances of Joint and PNFPeel and DP instances of CDPPeel. Due to the weakness of the relative error metric, and for the sake of space in the figure, we relegate its discussion to Section C.2.4.3.1 error
Joint’s performance is strongest for error (Figure 1, center column). This effect is particularly pronounced on the Books, Movies, News, and Tweets datasets. This is because these datasets have large gaps between the top counts (Figure 1, leftmost column), which results in large gaps between the scores that Joint assigns to optimal and suboptimal sequences. These large gaps enable Joint to obtain much stronger performance than the baseline pure DP algorithm, PNFPeel, and to beat even the approximate DP CDPPeel for a wide range of . In contrast, small gaps reduce this effect on Foods and Games. On these datasets, Joint slightly improves on PNFPeel overall, and only improves on CDPPeel for roughly .



The metric also features plateaus in Joint’s error on Foods and Games. This is because Joint’s error is gapdependent while PNFPeel and CDPPeel’s errors are more dependent: as grows, Joint’s maximum error may change as it ranks more items, but the item index where that error occurs changes monotonically. The reason is that Joint’s error ultimately depends on the count gaps under consideration. In contrast, the item index where PNFPeel and CDPPeel incur maximum error may increase and then decrease. This is because PNFPeel and (to a lesser extent) CDPPeel must divide their privacy budget by , and thus are increasingly likely to err (and incur the larger penalties for) top items as becomes large. Figure 2 plots the maximum error item index and illustrates this effect.
4.3.2 error
A similar trend holds for error (Figure 1, rightmost column). Joint again largely obtains the best performance for the Books, Movies, News, and Tweets datasets, with relatively worse error on Foods and Games. error is a slightly more awkward fit for Joint because Joint’s utility function relies on maximum count differences; Joint thus applies the same score to sequences where a single item count has error and sequences where every item count has error . This means that Joint selects sequences that have relatively low maximum (and ) error but may have high error. Nonetheless, we again see that Joint always obtains the strongest performance for small ; it matches PNFPeel for small datasets and outperforms it for large ones; and it often outperforms CDPPeel, particularly for large datasets and moderate .
4.3.3 Time comparison
We conclude with a time comparison using the largest dataset (Foods, ) and 5 trials for each . PNFPeel uses instances of the permuteandflip mechanism for an overall runtime of . CDPPeel’s runtime is dominated by finding the top values from a set of unordered values, which can be done in time . As seen in Figure 3, and as expected from their asymptotic runtimes, Joint is slower than PNFPeel, and PNFPeel is slower than CDPPeel. Nonetheless, Joint still primarily runs in seconds or, for , slightly over 1 minute.


5 Conclusion
We defined a joint exponential mechanism for the problem of differentially private top selection and derived an algorithm for efficiently sampling from its distribution. We provided code and experiments demonstrating that our approach almost always improves on existing pure DP methods and often improves on existing approximate DP methods when is not large. We focused on the standard setting where an individual user can contribute to all item counts. However, if users are restricted to contributing to a single item, then algorithms that modify item counts via Laplace noise [10, 28] are superior to Joint and peeling mechanisms. The best approach for the case where users can contribute to some number of items larger than but less than is potentially a topic for future work.
Acknowledgements
We thank Ryan Rogers for helpful discussion of the peeling mechanism.
References
 [1] (2020) Near instanceoptimality in differential privacy. arXiv preprint arXiv:2005.10630. Cited by: footnote 1.
 [2] (2017) The price of selection in differential privacy. In Conference on Learning Theory (COLT), Cited by: §1.2, §2.
 [3] (2010) Discovering frequent patterns in sensitive data. In Knowledge Discovery and Data Mining (KDD), Cited by: §1.2, §1.
 [4] (2017) Tweets Dataset  Top 20 most followed users in Twitter social platform. Harvard Dataverse. Note: https://doi.org/10.7910/DVN/JBXKFD/F4FULO Cited by: §4.2.
 [5] (2016) Differentially Private Password Frequency Lists. In Network and Distributed System Security (NDSS), Cited by: §1.1.
 [6] (2016) Concentrated differential privacy: Simplifications, extensions, and lower bounds. In Theory of Cryptography Conference (TCC), Cited by: §1, §4.1.2.

[7]
(2014)
Fingerprinting codes and the price of approximate differential privacy.
In
Symposium on the Theory of Computing (STOC)
, Cited by: §1.2.  [8] (2021) Bounding, Concentrating, and Truncating: Unifying Privacy Loss Composition for Data Analytics. In Algorithmic Learning Theory (ALT), Cited by: §4.1.2.
 [9] (2021) The PermuteandFlip Mechanism is Identical to ReportNoisyMax with Exponential Noise. arXiv preprint arxiv:2105.07260. Cited by: §4.1.1.
 [10] (2019) Free gap information from the differentially private sparse vector and noisy max mechanisms. In Very Large Databases (VLDB), Cited by: §C.3, §1.2, §5.
 [11] (2019) Practical Differentially Private Top Selection with Paywhatyouget Composition. In Advances in Neural Information Processing Systems (NeurIPS), Cited by: §1.2, §1.2, §1, §2, §4.1.2, §4.1.2.
 [12] (2006) Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography Conference (TCC), Cited by: §1, Definition 2.3.
 [13] (2014) The algorithmic foundations of differential privacy.. Foundations and Trends in Theoretical Computer Science. Cited by: Lemma A.1, Definition 2.4.
 [14] (2016) Concentrated differential privacy. arXiv preprint arXiv:1603.01887. Cited by: §1.

[15]
(2015)
A proactive intelligent decision support system for predicting the popularity of online news.
In
Portuguese Conference on Artificial Intelligence
, Cited by: §4.2.  [16] (2021) Differentially Private Quantiles. In International Conference on Machine Learning (ICML), Cited by: §1.1.
 [17] (2022) Dp_topk. Note: https://github.com/googleresearch/googleresearch/tree/master/dp_topk Cited by: §1.2, §4.
 [18] (2010) On the geometry of differential privacy. In Symposium on the Theory of Computing (STOC), Cited by: §1.2.
 [19] (2015) The MovieLens Datasets: History and Context. Transactions on Interactive Intelligent Systems (TiiS). Cited by: §4.2.
 [20] (2013) Privacypreserving data exploration in genomewide association studies. In Knowledge Discovery and Data Mining (KDD), Cited by: footnote 1.
 [21] (1997) The art of computer programming. Vol. 3, pp. 252–255. Cited by: Appendix B, §3.1.4.
 [22] (2014) Topk frequent itemsets via differentially private fptrees. In Knowledge Discovery and Data Mining (KDD), Cited by: §1.2.
 [23] (2012) PrivBasis: Frequent Itemset Mining with Differential Privacy. In Very Large Databases (VLDB), Cited by: §1.2.
 [24] (2014) Amazon product data, Grocery and Gourmet Food. Note: https://jmcauley.ucsd.edu/data/amazon/ Cited by: §4.2.
 [25] (2020) PermuteandFlip: A new mechanism for differentially private selection. In Neural Information Processing Systems (NeurIPS), Cited by: §1.2, §1, §4.1.1.
 [26] (2007) Mechanism design via differential privacy. In Foundations of Computer Science (FOCS), Cited by: Lemma A.1, Definition 2.4, Lemma 2.5, Lemma 2.7.
 [27] (2020) Duff: A DatasetDistanceBased Utility Function Family for the Exponential Mechanism. arXiv preprint arXiv:2010.04235. Cited by: §3.1.2, footnote 1.
 [28] (2021) Oneshot Differentially Private Topk Selection. In International Conference on Machine Learning (ICML), Cited by: §C.3, §1.2, §5.
 [29] (2019) Goodreadsbooks dataset. Note: https://www.kaggle.com/jealousleopard/goodreadsbooksAccessed: 20201227 Cited by: §4.2.
 [30] (2015) Between pure and approximate differential privacy. Journal of Privacy and Confidentiality (JPC). Cited by: §C.3, §1.1, §1.2.
 [31] (2017) Tight lower bounds for differentially private selection. In Foundations of Computer Science (FOCS), Cited by: §1.2.
 [32] (2016) Steam video games dataset. Note: https://www.kaggle.com/tamber/steamvideogames/dataAccessed: 20211123 Cited by: §4.2.
 [33] (2012) On differentially private frequent itemset mining. Very Large Databases (VLDB). Cited by: §1.2.
Appendix A Proof of Joint Utility Guarantee (Theorem 3.3)
Proof of Theorem 3.3.
The following is a basic utility guarantee for the exponential mechanism.
Lemma A.1 (McSherry and Talwar [26], Dwork and Roth [13]).
Let be the utility value produced by an instance of the exponential mechanism with score function , output space , dataset , and optimal utility value . Then
Taking , and using the fact that for Joint’s utility function , completes the result. ∎
Appendix B Full Privacy, Runtime, and Storage Space Proof For Joint (Theorem 3.9)
Proof of Theorem 3.9.
Recall that Joint refers to the algorithm that uses the efficient sampling mechanism. Here, we first prove that Joint samples a sequence from the exponential mechanism with utility .
Let EM refer to the naive original construction of the exponential mechanism with utility . It suffices to show that Joint and EM have identical output distributions. Fix some sequence of indices from .
If are not distinct, then Joint never outputs . This agrees with the original definition of the exponential mechanism with utility function , which assigns score to any sequence of item indices with repetitions. Thus, for any with nondistinct elements, .
If instead are distinct, by Lemma 3.7, . Let be its score in , so . Let denote the set of possible values for ; note that this a set of integers and does not have repeated elements. Then
by Lemma 3.7. Then we continue the chain of equalities as
where the second equality uses Lemma 3.5.
Having established the privacy of Joint, we now turn to proving that its runtime and storage space costs are and , respectively.
Referring to Algorithm 1, line 2 takes time and space . Line 3 takes time and space . Line 4 takes time and space ; since each row of is already decreasing, we can use way merging [21] instead of naive sorting.
The loop on Line 8 handles the that are zero. Its variable setup on Lines 57 takes time and space . Lines internal to the loop each take time and space. So, overall, this block of code requires time and space .
The loop on Line 15 handles the nonzero . Its variable setup on Lines 1314 takes time and space . Lines internal to the loop each take time and space. So, overall, this block of code requires time and space .
Sampling a utility (Line 20) requires time and space . The remaining loop (Line 22) iterates for steps, and each step requires time and space.
Overall, this yields runtime and storage space costs of