Assessing Centrality Without Knowing Connections

05/28/2020 ∙ by Leyla Roohi, et al. ∙ The University of Melbourne 0

We consider the privacy-preserving computation of node influence in distributed social networks, as measured by egocentric betweenness centrality (EBC). Motivated by modern communication networks spanning multiple providers, we show for the first time how multiple mutually-distrusting parties can successfully compute node EBC while revealing only differentially-private information about their internal network connections. A theoretical utility analysis upper bounds a primary source of private EBC error—private release of ego networks—with high probability. Empirical results demonstrate practical applicability with a low 1.07 relative error achievable at strong privacy budget ϵ=0.1 on a Facebook graph, and insignificant performance degradation as the number of network provider parties grows.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

This paper concerns the measurement of node importance in communication networks with egocentric betweenness centrality (EBC) [7], representing how much a node’s neighbours depend on the node for inter-connection. EBC has emerged as a popular centrality measure that is used widely in practice [12].

EBC computation has many applications. In conjunction with methods for identifying fake news [11, 15], EBC can be used to limit its propagation by targeting interventions at those individuals who are most critical in spreading information. EBC computation is straightforward when all communication network information is available to one trusted party. However in reality modern telecommunications involve competing network providers, even within the one country. While many people communicate between countries with completely different networks, where no central authority that can view all their connections. While recent work [18] considered the case of two mutually-distrusting networks, multiple networks are essential for understanding one person’s communication and presents non-trivial technical challenges.

Here we present a protocol that preserves the privacy of the internal connections of each of arbitrarily-many networks while they collaborate on the computation of EBC. By carefully structuring information flow, we achieve highly accurate results and strong privacy protection. We produce a private output that can be safely published. We assume the complete list of nodes (i.e., people) is public, while individual connections are private. Each service provider knows the connections within its own network, plus the connections between one of its members and the outside (e.g., from when they contact someone in a different network). Connections internal to other networks are unknown. We prove that our protocol preserves edge differential privacy [8] in which the existence or non-existence of an edge must be protected. We present:

  1. A protocol for multi-party private EBC computation;

  2. A strengthened adversarial model in comparison to prior work—all

    participating networks are protected by edge-DP, even when the final output is published;

  3. A high-probability utility bound on a core component of our protocol: private distributed release of ego networks;

  4. Comprehensive empirical validation on open Facebook, Enron and PGP datasets. This demonstrates applicability of our algorithm with modest 1.07 relative error at strong DP, practical runtimes, and insignificant degradation with increasingly many parties.

Near-constant accuracy with increasing numbers of parties is both surprising and significant, as is our innovation over past work [18] by preventing leakage at final EBC release. Our protocol is substantially more efficient than a naïve extension of two-party techniques, which would require total communication of where is the set of vertices and the parties. We achieve total communication of . All participants are equal—there is no centralisation; and all parties are protected by edge DP. While we reuse the [18] subset release two-stage sampler, we offer new analysis. We prove in Proposition 1 that it can be distributed without privacy/accuracy loss, and establish a new high-probability utility bound on EBC based on this mechanism’s release (Theorem 6.1).

2 Related Work

In most prior work on differentially-private graph processing, the computation is performed by a trusted authority with complete knowledge of the whole network. Only the output, not the intermediate computations, must preserve privacy [8, 1, 9, 20, 3, 16, 19, 10, 17]. There is considerable work on distributed differential privacy [4, 2, 14], where queries are distributed or decomposed into sub-queries. However there is far less work in our distributed privacy model, in which even the intermediate communication should preserve differential privacy. This privacy model is mostly related to distributed graphs where parties seek joint computation of statistics of the graph. The most closely related work is [18], which derives an edge-DP algorithm for EBC, but only for two networks. The algorithm allows two mutually-distrusting parties to jointly compute the EBC privately using the exponential and Laplace mechanisms to maintain differentially-private intermediate computations and communications. However their work assumes the first party acts as a centraliser that does not share the final result. It is thus not directly applicable to our setting.

3 Preliminaries

3.1 Egocentric Betweenness Centrality

Proposed by [6] as a way of measuring a node’s importance by looking only at its ego network, the set of nodes directly connected to it. To compute the EBC of node , we count, for each pair of -neighbouring nodes , what fraction of shortest paths between them pass through . Since we consider only paths within ’s ego network, only paths of length two are relevant. We count zero for any pair that is directly connected, because these two nodes do not rely on at all.

Definition 1 ([6])

Egocentric betweenness centrality of node in simple undirected graph is defined as

where denotes the neighbourhood or ego network of , denotes the adjacency matrix induced by with if and otherwise; denotes the -th entry of the matrix square, guaranteed positive for all since all such nodes are connected through .

3.2 Differential Privacy on Graphs

We wish to protect the privacy of connections in networks, which are the edges in the network graph. Differential privacy (DP) [5] limits the differences in an algorithm’s output distribution on neighbouring databases, thus quantifying the information leakage about the presence or absence of a particular record. We use edge privacy [8], because we wish to control the attacker’s ability to make inferences about the presence of individual edges. As graphs on identical node sets, two databases are neighbours if they differ by exactly one edge.

Our databases are simply adjacency matrices in which an element is 1 if there is an edge from to and zero otherwise. Equivalently, these can be considered as sequences of bits: elements of (where is the number of nodes in the network choose two). Formally, two databases are termed neighbouring (denoted ) if there exists exactly one such that and for all . In other words, .

Definition 2

For , a randomised algorithm on databases or mechanism is said to preserve -differential privacy if for any two neighbouring databases , and for any measurable set ,

In this paper, we employ several generic mechanisms from the differential privacy literature including Laplace [5]

for releasing numeric vectors, and the exponential 

[13] for private optimisation. We also use a recent subset release mechanism [18] which leverages the exponential mechanism, and develop its theoretical utility analysis.

Lemma 1 ([18])

Consider a publicly-known set and a privacy-sensitive subset . The exponential mechanism run with quality function and preserves -DP. Algorithm 4 (see Appendix) implements this mechanism, running in space and time.

4 Problem Statement

We have participating parties , each representing a telecommunications service provider. They control a global communication graph whose nodes are partitioned into (disjoint) sets one per service provider s.t. contains the nodes of party . Every customer is represented as a node that belongs to one and only one service provider; pairs of customers who have had some communication (e.g., a phone call, SMS or email) are edges.

We will often equivalently represent edge sets as adjacency matrices (or flattened vectors) with elements in .

We write for the set of edges in between nodes in —these are communications that happened entirely within . Similarly, are the edges with one node in and the other in —these represent communications between two service providers. Set is the disjoint union of all such edge sets.

We assume that all nodes are known to all parties, but that each party learns only about the edges that are incident to a node in its network, including edges within its network.

We wish to enable all parties to learn and publicly release the EBC of any chosen node , while maintaining edge privacy between all parties. Without loss of generality we assume . We also denote by . Before detailing a protocol for accomplishing this task, we must be precise about a privacy model.

Problem 1 (Private Multi-Party EBC)

Consider a simple undirected graph(, ) partitioned by parties as above, and an arbitrary node . The problem of private multi-party egocentric betweenness centrality is for the parties to collaboratively approximate under assumptions that:

  1. All parties know the entire node set ;

  2. Each party knows every edge incident to nodes within its own network, i.e.

  3. The computed approximate needs to be available to all of parties.

The intermediate computation must protect -differential edge privacy of each party from the others. We seek solutions under a fully adversarial privacy model: irrespective of whether other parties follow the protocol, the releases by party protect its edge differential privacy. (Of course a cheating participant can always release information about edges it already knows, which may join another network.)

Furthermore, the output must protect -differential privacy of the edges. In [18] the final EBC could be revealed only to the party who made the query. In this paper, the final EBC is -differentially private and can be released safely to anyone.

5 Multi-Party Private EBC

We describe three algorithms: SubsetRelease, PrivatePathCount, and PrivateReciprocateandSum, which are privacy-preserving versions of Steps iiii of Protocol 0.A.1 (See Appendix). These then combine to produce PrivateEBC, a differentially-private version of the whole protocol.

5.1 Private Ego Network Broadcast

Each party runs SubsetRelease, Algorithm 4 (see Appendix) with its share of ’s ego network. It broadcasts the output —the approximation of .

SubsetRelease uses the exponential mechanism to privately optimise a particular quality function (Lemma 1) that encourages a large intersection between and release , along with a minimal symmetric set difference. As each party runs this mechanism relative to its own node set, it operates its own quality function defined relative to (see Proposition 1 for the formal definition). We observe a convenient property of the quality functions run by each party: they sum up to the overall quality function if the ego party was to run SubsetRelease in totality. This permits proof (see the proof of proposition 1 in 0.C) that this simple distributed protocol for private ego network approximation exactly implements a centralised approximation. There is no loss to privacy or accuracy due to decentralisation.

Proposition 1

Consider parties running SubsetRelease with identical budgets and quality functions , on their disjoint shares to produce disjoint private responses . Then is distributed as SubsetRelease run with , quality function , on the combined in . Consequently the individual and the combined , each preserve -DP simultaneously.

5.2 Private Path Count

0:  ego node (remember, by assumption, contains ); execution party ; true node set ; for each , edge set and private node set ;
0:  A vector of noisy counts, indexed by endpoints with , of the total number of nodes in that are connected to both and .
1:  if  then
2:     
3:  end if
4:  
5:  for  do
6:     for  with  do
7:        
8:        
9:     end for
10:  end for
11:  return  
Algorithm 1 PrivatePathCount

Each party runs Algorithm 1, using the ’s received from each other party in the previous step. Party counts all the 2-paths where the intermediate node is in . For each node pair with , will send the 2-path count to the party that contains node , just like the non-private version of the protocol. But first, in order to privatise this vector of counts, Laplace noise is added to the two-path counts according to the sensitivity in the following lemma proved in Appendix 0.D, thereby preserving -DP in this stage’s release.

0:  ego node ; execution party ; for each , edge set and private node set ; for each , noisy counts ;
1:  
2:  
3:  
4:  for  do
5:     for  with  do
6:        if  then
7:           
8:           
9:        end if
10:     end for
11:  end for
12:  
13:  return  
Algorithm 2 PrivateReciprocateandSum
0:  (Public) ego node ; ordered set of parties ; node sets for ; parameter vectors .
0:  (Private) for each , edges , nodes ;
1:  for  in parallel do
2:     Party does:
3:     if  then
4:        
5:     else
6:        
7:     end if
8:     
9:     Broadcast
10:     
11:     for all  with  do
12:        Send to the Party s.t.
13:     end for
14:     . {Party reciprocates and sums only paths with .}
15:     Broadcast
16:     
17:     Return
18:  end for
Algorithm 3 PrivateEBC
Lemma 2

Let query denote the vector-valued non-private response of party in Algorithm 1. The -global sensitivity of is upper-bounded by .

5.3 Private Reciprocate and Sum

Every party receives noisy counts from PrivatePathCount and for any pairs where and , that are believed by to be disconnected, increments the received by the number of incident 2-paths. Each party then reciprocates the summation of the counts. In this algorithm, each party may replace noisy with true . This optimises utility at no cost to privacy: counts for are discarded. This is safe to do, since the Laplace mechanism already accounts for changes in . The Laplace noise is utilised to privatise the reciprocated sum to -DP, calibrated by sensitivity as bounded next, with proof can be found in Appendix 0.E.

Lemma 3

Let query denote the reciprocate and sum over 2-paths with intermediate point in while the nodes are not connected and and . Then the -global sensitivity of is upper-bounded by irrespective of party.

Communication complexity

Ego Network Broadcast requires each party to send to each other party bits of length 1 that shows the node is present or not, hence a total of . Private Path Count sends, for each node , up to messages from each party to the owner of node . The pathcounts are at most , so the total size is . Finally, Reciprocate and Sum requires every participant to send each other one message: . Hence the total communication complexity is .

5.4 PrivateEBC: Putting it All Together

After the parties have run the protocol phases, namely SubsetRelease, PrivatePathCount and PrivateReciprocateandSum, they must finally complete the computation of the private EBC. Algorithm 3 depicts PrivateEBC orchestrating the high-level protocol thus far, and then adding the received to compute final EBC.

Theorem 5.1

PrivateEBC preserves -DP for each party.

Remark 1

While we have used uniform privacy budgets across parties, our analysis immediately extends to custom party budgets.

((a)) Median relative error of the 60 random nodes with 0.1 to 7, Facebook, Enron and PGP data set, for three parties.
((b)) Median Relative error of 120 nodes with , for different number of parties for PGP.
Figure 3: Utility of Private EBC for Facebook, Enron and PGP data sets.

6 Utility Bound

In this section we develop a utility analysis of privacy-preserving betweenness centrality, noting that no previous theoretical analysis has been performed including in the two-party case [18]. Our analysis focuses on a utility bound on EBC resulting from the subset release mechanism run to privatise the ego network. We abuse notation with referring to the quality function of the SubsetRelease mechanism of Lemma 1 with dependence on the private made implicit; likewise for the quality functions run by each party in the decentralised setting. The technical challenge is in leveraging the following well-known utility bound on the exponential mechanism, which only establishes high-probability near-optimal quality.

Corollary 1

Consider parties each running SubsetRelease concurrently with budgets and quality functions on their disjoint shares to produce disjoint responses . Then the consequent high-probability quality bound of Lemma 4 (see Appendix) holds for random combined response .

Our first step is to relate EBC error on a released to the quality . We organise differences in EBC by reciprocal 2-path count terms, enumerating shared and unshared such terms between private and non-private EBCs. As these terms and their differences are bounded by one, the task reduces to measuring differences in ego network cardinalities, Lemma 5 (see Appendix). Conveniently, this is also the goal of our quality score function. We now use Lemma 5 with our lifted exponential utility bound Proposition 1 to bound EBC error. The previous lemma is agnostic to the number of parties—the released might be produced in its entirety by the ego node’s party through a single call to SubsetRelease, or it could be the disjoint union of multiple calls to SubsetRelease by each party. Likewise our lifted bound on high-probability quality also holds as if is produced centrally. As such the proof—found in the Appendix—of the following result may proceed as if this is the case.

We prove the following high-probability utility bound in the Appendix 0.I.

Theorem 6.1

Consider privacy budget , true ego network and . And suppose that each party runs SubsetRelease with budget , quality function , on their disjoint share to produce disjoint private response . Then produced from incurs error relative to non-private EBC run on non-private , upper bounded as with probability at least:

Remark 2

The bound of Theorem 6.1 can make meaningful predictions (i.e., is non-vacuous). For example a modest privacy budget of 2.1 is sufficient to guarantee reasonable relative error 3 w.h.p 0.999 for a large ego network spanning half an (otherwise sparse) graph. Similar relative error (for end-to-end private EBC) at similar privacy budgets occurs in experiments on real, non-sparse networks below. Further analysis can be found in Appendix 0.J.

((a)) Time of computing 60 random nodes with 0.1 to 7, Facebook data , for three parties.
((b)) Time of computing 60 random nodes with 0.1 to 7, Enron data set , for three parties.
((c)) Time of computing 60 random nodes with 0.1 to 7, PGP data set, for three parties.
((d)) Relative error of 60 nodes with different degrees for 1, Facebook data set.
((e)) Relative error of 60 nodes with different degrees for 1, degree, Enron data set.
((f)) Relative error of 60 nodes with different degrees for 1, PGP data set.
Figure 10: Timing results and effect of degree for Facebook, Enron and PGP data sets.

7 Experimental Setup

In order to validate the utility and privacy of PrivateEBC, we experimented with three different graphs on Facebook friendships with 63,731 nodes and 817,035 edges111Institute of Web Science and Technologies at the University of Koblenz–Landau: The Koblenz network collection (2018), , the Enron email network with 36,692 nodes and 183,831 edges222Stanford University: Stanford large network dataset collection (2009)and Pretty Good Privacy (PGP) with 10,680 nodes and 24,316 edges1. We employ uniform random sampling in order to partition the graphs into multiple disjoint parties while keeping the structure of the graph intact. In addition to evaluations on three parties across datasets, we also validated utility across 2, 3, 5, 7 and 10 parties on the PGP data set. The experiments were run on a server with core Xeon’s (112 threads with hyper threading) and 1.5 TB RAM, using Python 3.7 without parallel computations. We employed the Mpmath arbitrary precision library for implementing inverse transform sampling (Algorithm 5 in the Appendix) and set the precision to 300 bits. We use relative error between true EBC and private EBC—the lower the relative error the higher the utility. Any errors around 1 or 2 are considered practical as they signify EBCs within the same order of magnitude. We ran the experiment 60 times for each chosen value by choosing the target ego nodes randomly and robustly aggregating the relative error by median. Throughout we set .

8 Results

First, we demonstrate how PrivateEBC utility varies with increasing privacy budget from 0.1 to 7, for three parties across each of three different graph datasets. The median relative error between real and private EBC represents utility. Figure LABEL:sub@fig:a1 displays the results for Facebook, Enron and PGP data sets, where median relative error decreases significantly when is increased to a strong guarantee of 1, and remains small for larger . For strong privacy guarantee of , median relative error is usually for all three data sets. These results demonstrate that PrivateEBC achieves practical utility across a range of graph sizes and privacy levels. Next we report utility at privacy for the number parties ranging over 2, 3, 5, 7 and 10. Every point in Figure LABEL:sub@fig:b1 shows the median relative error between private and real EBC across 120 randomly chosen nodes in the PGP data set. Our results find insignificant degradation occurs to accuracy or privacy when growing the number of parties.

Remark 3

While more parties means more calls to ReciprocateandSum and PrivatePathCount such that the scale of the second and third mechanisms’ Laplace noise increases moderately, the major source of error, SubsetRelease,

is not affected by the number of parties as proved in Proposition 1.

We report on timing analysis for PrivateEBC as a function of privacy. Median computation time of 60 random ego nodes for budget from 0.1 to 7 is reported in Figures LABEL:sub@fig:b, LABEL:sub@fig:e and LABEL:sub@fig:c, on Facebook, Enron and PGP data sets. Here total time overall decreases as privacy decreases (increasing ), while a small increase to runtime can be seen at very high levels of privacy (low but increasing for Enron it is likely due to different behaviours in the protocol with increasing . When the set difference of and is small, the two-stage sampler generates small numbers of nodes in faster time. However faster runtime with lower privacy dominates behaviour overall.

Figures LABEL:sub@fig:c, LABEL:sub@fig:f, LABEL:sub@fig:i show how the median relative error is changing by ego node degree. We report results on privacy budget , which do not show significant dependence: In Facebook the median relative error is almost constant for different node degrees and in Enron and PGP for node degrees up to , deviations are approximately 1% and 0.5% of the maximum relative error respectively.

9 Conclusion and Future Work

This paper develops a new protocol for multi-party computation of egocentric betweeness centrality (EBC) under per-party edge differential privacy. We significantly improve on past work by extending to multiple parties, achieving very low communication complexity, theoretical utility analysis, the facility to release the private EBC to all parties. Experimental results demonstrate the practical accuracy and runtime of our protocol at strong levels of privacy.

For future work we hope to allocate differential privacy budgets per stage, by optimising utility bounds. We also intend to develop a network model that reflects a person’s use of multiple media, so that the node set need not be disjointly partitioned, while the privacy of edges remains paramount.

Appendix

Appendix 0.A Non-Private Multi-Party Protocol

We first show how different parties can compute EBC without preserving privacy, but with special attention to efficiency (so as to improve on a naïve application of the two-party protocol of [18]). A party that contains a node can always count the number of 2-step paths through , but it doesn’t know which of the nodes adjacent to are in the ego-network of (except in some special cases). So in order for to count the number of 2-paths in ’s ego network that pass through , we require each other party to tell which of its nodes are neighbours of . This is denoted by .

Recall that denotes the ego network of anywhere in the graph (not including since the graph has no self-loops). Figure 11 summarises the following protocol.

Protocol 0.A.1

All parties execute in parallel, waiting until they have received all messages from one step before commencing the next step. Party proceeds as follows:

  1. [EgoNetwork] broadcasts to every party the set of neighbours of contained within ;

  2. [PathCount] For all nodes s.t. , party computes , the number of 2-paths from to where the intermediate point (irrespective of whether are directly connected). It sends to the party for which ;

  3. [ReciprocateAndSum] For every and all , computes the total number of 2-paths between provided these nodes are disconnected: it sums for all . It then sets to be the reciprocal of this sum and broadcasts this value to all parties;

  4. completes the computation of as .

Participants can easily tell when to move on to the next step. At the end of Step ii, party should have received , from each other party , for each node , and each with . By the end of Step iii it should have received a broadcast value from all other parties.

Appendix 0.B Privacy Disclosure

Now consider how the privacy of edges can be compromised in the first three steps of Protocol 0.A.1.

  1. When is broadcast in Step i, other parties learn directly of all edges incident to in .

  2. When is sent to in Step ii, it reveals information about edges outside . A worst case occurs for node when there is only one node connected to it. Then reveals the existence of edge for all .

  3. When broadcasts , it reveals the connection status of edges within . In the worst case when there is just two nodes and in , an edge between them can change from a non zero value to zero.

no [small values, instance distance=1cm,left environment distance=1.7cm, right environment distance=1.7cm]

usrParty m1Party m2Party

[label distance=0.07ex]Ego network in usrm1

usrm2[1]

[label distance=0.07ex]Path countusrm1 usrm2[1]

[label distance=0.07ex][label position=above right]Reciprocate and Sumusrm1 usrm2[1] yes

Figure 11: EBC multi-party protocol for party , comprising three messages per party. Visualised for three parties.

Appendix 0.C Proof of Proposition 1

Observe that since the form a disjoint partition of , that for any we have that

Combining this with the independence of the concurrent executions of SubsetRelease, we have that the joint density over their releases corresponds to

Due to uniqueness of probability density normalisation, this proves the result.

Appendix 0.D Proof of Lemma 2

Algorithm PrivatePathCount is executed by all parties; the output of these computations is broadcast to other parties. As we need to preserve the privacy of each party’s edges individually, we consider one (arbitrary) party . The output of is a vector of the counts of all -paths connecting with intermediate node . Adding or removing an edge from , can worst-case change elements of ’s counts by one each. To see this, consider a very highly-connected node and the deletion of for any : this reduces the counts for paths joining for all . Likewise if were also within party and were highly connected then the edge removal would also reduce counts for paths joining for all . This proves that the sensitivity for part running PrivatePathCount is irrespective of party .

Appendix 0.E Proof of Lemma 3

Once again we apply post-processing to previously sanitized outputs . Consider any party and the effect of removing/adding an edge incident to a node of . At worst this will result in the condition of line 6 of Algorithm 2, evaluating differently—for at most one pair . That is, the non-private sum of reciprocals can be affected by the addition/removal of a single term . Such a term is upper bounded by the reciprocal of lower bound on . That is, the sensitivity, irrespective of executing party , is 1.

Appendix 0.F Lemma 4

Lemma 4 (Lemma 7 of [13])

Consider a centralised party running SubsetRelease with budget , quality function on . For let , and let be the uniform probability mass function on . Then .

Appendix 0.G Lemma 5

Lemma 5

For any we may bound the additive EBC error from using instead of non-private according to the quality function applied to the random release:

0.g.1 Proof of Lemma 5

Let denote the number of 2-paths connecting within node-sets and respectively. Our goal is to upper bound the quantity:

following from the triangle inequality and collection of terms with shared end-point nodes . By cases: only when both end points are elements of the intersection is there a pair of matching EBC terms. Otherwise at least one (or both) end-point nodes sit outside or respectively—in which case there is only one EBC term for the corresponding node set.

We can see that both types of summand are bounded above by unity. For in the second case, for any ,

since since there is always a path through egonode . The same holds for the case of by definition. And in the first case of , we have that

Therefore it follows that

where the first equality follows by enumerating the pairs being counted as all those within or provided that both nodes do not reside in separate set differences and .

This completes the result.

Appendix 0.H Proof of Corollary1

The claim follows from Lemma 4 combined with Proposition 1.

Appendix 0.I Proof of Theorem 6.1

We begin by defining two events of interest on for any to be chosen later: that our (centralised by Proposition 1) exponential mechanism achieves near-optimality, and that the resulting EBC is near optimal. Rewriting the event defined in Lemma 4, using and , we have that

which holds w.p. at least . Next define

By Lemma 5 we have that . And since is increasing in for , event implies then that and so and . Provided that , taking this completes the result.

Appendix 0.J Analysis for Remark 2

Consider a very sparse graph on nodes , and consider a true ego network with cardinality represented as a fraction of i.e., . Let us now invoke the theorem with error bound taken to be a multiplier of EBC—yielding a relative error bound of . For very sparse graphs, EBC can arbitrarily approach for bounded away from zero and large . Thus setting corresponds to relative error which is meaningful for not much larger than 1 (corresponding to private EBC within the same order of magnitude as non-private EBC). Next we wish to set the confidence to be very close to unity e.g., corresponds to confidence exceeding 99.9%. With these choices, we set the confidence in Theorem 2 to and solve for the privacy budget required for SubsetRelease:

Appendix 0.K Precise algorithms

We rephrase the forward-pass algorithm from [18] as a generic subset release mechanism through Algorithms 4, 5 and 6.

0:  public set , private subset ;
1:  
2:  
3:  return  
Algorithm 4 SubsetRelease Two-Stage Sampler
0:  cardinality ; // Compute log-space PDF of
1:  
2:  for  do
3:     
4:  end for
5:  
6:  
7:  for  do
8:     if   then
9:        return  
10:     end if
11:     
12:  end for
13:  return  
Algorithm 5 InverseTransformSampler
0:  public set ; private subset ;
1:  
2:   without replacement
3:  for  do
4:     if  then
5:        
6:     else
7:        
8:     end if
9:  end for
10:  return  
Algorithm 6 PickAndFlipSampler

References