How Private Is Your Voting? A Framework for Comparing the Privacy of Voting Mechanisms

05/15/2018
by   Ao Liu, et al.
Rensselaer Polytechnic Institute
0

Voting privacy has received a lot of attention across several research communities. Traditionally, cryptographic literature has focused on how to privately implement a voting mechanism. Yet, a number of recent works attempt to minimize the amount of information one can infer from the output (rather than the implementation) of the voting mechanism. These works apply differential privacy (DP) techniques which noise the outcome to achieve privacy. This approach intrinsically compromises accuracy, rendering such a voting mechanism unsuitable for most realistic scenarios. In this work we investigate the inherent "noiseless" privacy that different voting rules achieve. To this end we utilize the well-accepted notion of Distributional Differential Privacy (DDP). We prove that under standard assumptions in voting literature about the distribution of votes, most natural mechanisms achieve a satisfactory level of DDP, indicating that noising--and its negative side-effects for voting--is unnecessary in most cases. We then put forth a systematic study of noiseless privacy of commonly studied of voting rules, and compare these rules with respect to their privacy. Note that both DP and DDP induce (possibly loose) upper bounds on information leakage, which makes them insufficient for such a task. To circumvent this, we extend the definitions to require the bound to be exact (i.e. optimal) in a well defined manner. Although motivated by voting, our definitions and techniques can be generically applied to address the optimality (with respect to privacy) of general mechanisms for privacy-preserving data release.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

11/27/2019

Reviewing and Improving the Gaussian Mechanism for Differential Privacy

Differential privacy provides a rigorous framework to quantify data priv...
11/29/2017

Quantifying Differential Privacy in Continuous Data Release under Temporal Correlations

Differential Privacy (DP) has received increasing attention as a rigorou...
08/14/2019

Aggregating Votes with Local Differential Privacy: Usefulness, Soundness vs. Indistinguishability

Voting plays a central role in bringing crowd wisdom to collective decis...
09/30/2019

Optimal Differential Privacy Composition for Exponential Mechanisms and the Cost of Adaptivity

Composition is one of the most important properties of differential priv...
10/15/2021

The Privacy-preserving Padding Problem: Non-negative Mechanisms for Conservative Answers with Differential Privacy

Differentially private noise mechanisms commonly use symmetric noise dis...
05/22/2020

VoteAgain: A scalable coercion-resistant voting system

The strongest threat model for voting systems considers coercion resista...
08/08/2019

That which we call private

A casual reader of the study by Jayaraman and Evans in USENIX Security 2...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Privacy-ensuring voting has been extensively studied in the cryptographic literature. The scope of these works is to enable voters to cast their votes (possibly online) in a manner that ensures that someone with access to their messages cannot link them to their votes. However, another, equally important question about voting privacy has received much less attention: How much can someone infer about your vote by observing information on the outcome released by different mechanisms?

Suppose, for example, you cast a vote in a presidential election indicating your preference over the candidates. After the election, the winner is announced. What can an (adversarial) observer infer about your vote if a count of votes that indicate the same preference—i.e., the histogram of votes—is announced? How about if the number of votes for each candidate in your city is announced? How about in your building or in your family? Arguably, answering the above privacy-related questions is highly relevant in practice. For example, minimizing the amount of information leaked by the information announced by the voting mechanism can help protect against censorship and coercion and prevent vote buying.

The above questions are closely related to questions about releasing information on a dataset in a privacy-preserving manner. Indeed, a voting system can be cast as a two-step process: first the votes (data) are collected by a trusted authority into a database and then a voting rule is applied on the collected data to announce the outcome. Casting the problem in such a away indicates that standard methods for protecting the privacy of released data could be also applied to voting for addressing the above questions.

The most widely accepted such method is differential privacy (in short, DP) [8]. At a high level, DP bounds the amount of information that a function of a dataset reveals on any record of data. Informally, for any given database and any given query on , DP requires that no observer can distinguish the output from the output of applied on any neighboring database –i.e., is derived from by removing (or modifying) any given record. The main method for achieving DP for a given query is by means of a mechanism which noises the outcome of the query, so that one cannot decide whether or not any individual record was included in computing the query’s outcome—i.e., the response distribution should be almost the same with or without any given record.

The privacy level of a DP mechanism is indicated by a scalar —resulting in what is known as -DP—which upper bounds the ability of any distinguisher to distinguish between the (noised) output applied on and . In order to allow for better DP mechanisms, the above traditional definition of DP is often extended by another parameter

which, informally, bounds the probability that any privacy guarantee is compromised. This yields the notion of

-DP [9] 111-DP is identical to -DP for . In both these notion, the smaller the (and ) the better the corresponding privacy guarantee.

Given its wide adoption as a mechanism for making data release private a number of works has recently applied DP in voting privacy as a tool for limiting the inference from the output of voting mechanisms [25, 17, 20, 4]. Alas, as natural as the above idea might seem at first, it has two critical issues.

First, noising the outcome inherently reduces the accuracy of the output, i.e., introduces a probability of error, which is unacceptable in many high-stakes situations. To see this, imagine a situation where candidate A wins by majority, but the mechanism outputs candidate B as the winner, arguing that this is to make the votes more private. And, of course, mechanisms that apply more noise are more private (i.e., achieve better DP), but they are also less accurate. Thus, these solutions result in voting mechanisms which might, with some probability that get higher with more privacy, announce a wrong winner; this makes such solutions unusable for most applications. To resolve this problem, in this work we employ one ([1]) of several distributional notions of differential privacy, that allow for noiseless mechanisms [3, 21, 7, 1].

The second, and more delicate issue, is that differential privacy puts only an upper bound on the privacy leakage of a mechanism. In other words, given two -DP mechanisms and , one cannot directly answer which one is more private, as one is often not able to directly use the mechanisms’ DP parameters to compare voting rules—or any other type of mechanisms—with respect to their privacy. To rectify this, in this work we introduce the notion of exact (distributional) differential privacy. We note in passing that the question of comparing DP mechanisms with respect to their utility, has been extensively studied in the related literature [23, 5, 16, 2, 12]. However, these works introduce utility as a function of a mechanism’s accuracy (according some proposed metric) thus making them insufficient to answer our original question: Given two -DP mechanisms, which one is the most private?

1.1 Contributions

Motivated by the above voting-privacy questions, we propose a framework for comparing (voting) mechanisms according to the privacy they provide. To this end, we put forth the notion of exact privacy. Intuitively, a mechanism is exact private with respect to a set of privacy parameters, if there are no strictly better parameters that the mechanism can achieve. More concretely, we start by devising an exact version of -differential privacy (-DP) [8] which we term -exact Differential Privacy (in short, -eDP). Our notion renders a mechanism -eDP if and only if it satisfies the following conditions: (1) it is -DP and (2) there exists no such that the mechanism is -DP. This definition of exactness is trivially extended to -eDP by requiring exactness, as in Condition 2, above, with respect to both and .

Next, we use exact privacy to answer questions of the type: Given two -DP mechanisms, which one is preferred? Exact privacy gives a way to answer to this question: prefer the mechanism which is inherently more private, i.e., achieves better (smaller) exact privacy paramaters. The above natural question has been approached in the past by introducing explicit utility functions and using them as a measure of mechanism quality (eg. [23, 5, 12, 16, 2].) In fact, exact privacy and utility turn out not to be completely unrelated quantities. Indeed, we show that a -DP mechanism is -eDP if it is optimal for some utility function that monotonically increases as privacy deteriorates (Theorem 4.1). We exemplify this relationship between exactness and utility by showing that the -truncated geometric mechanism—a provably optimal DP mechanism for a natural class of utility functions [12]—is -exact DP; to prove this we prove that one of the above class of utilities is monotonically increasing and apply our theorems. Given that our notion of monotonicity implies exactness, one might ask whether the converse direction is true: given an -exact DP mechanism and a monotonically increasing utility function, is the mechanism optimal for this utility? We show that this is not the case, by analyzing the well known Laplace mechanism [10] which, as we prove is exact DP, but it is not optimal for a monotonically increasing utility function.

Having defined exact privacy as a notion for comparing differentially private mechanisms, we next turn to using this notion in voting, and comparing the privacy of different voting rules. We first observe that, by its nature, noising the outcome of a voting mechanism renders the mechanism unsuitable for most voting applications. For this reason we resort to distributional differential privacy (DDP) [1] and extend the notion of exactness to this definition as well, resulting to -eDDP.

Avoiding noising the output does not come for free, as DDP—and any other noiseless privacy notion—requires assumptions on (1) the distribution from which the data is generated and (2) on the auxiliary information the observer/distinguisher has on it. We tackle the first issue by looking back into our problem, i.e., voting, and employ a standard assumption about the distribution of votes in the related literature, concretely, that that each row of the database (i.e., each vote) is independently and identically distributed (i.i.d.). In social choice, this is a natural assumption called impartial culture.

The second issue, i.e., auxiliary information, is more subtle as different auxiliary information trivially affects the privacy of different mechanisms. Hence, to order DDP mechanisms with respect to their privacy it is necessary to consider them both under the same auxiliary information. The most objective such information is naturally one which is not correlated with the input distribution or the output of the mechanism. For this reason, in this work we choose the empty (set) the auxiliary information.

We remark that, to our knowledge, no prior work has considered applying the idea of DDP on voting privacy. Therefore, as a sanity check we demonstrate that reasonable DDP parameters can be proven for (noiseless) voting rules under our assumptions above (Theorem 3.1). Concretely, we prove that when database rows are i.i.d., the histogram mechanism (eg. outputting the profile, which is a histogram of votes) is -DDP222When the database distribution and auxiliary information (which is the empty set for our results) are clear from context, we sometimes omit them. for every database size . As DDP is immune to post-processing, voting rules which depend only on the profile also satisfy the same parameters as above.

Having demonstrated that DDP is a well-behaved privacy notion in the context of voting, we dive into exactness which will allow us to compare different voting rules. Here we are faced with a multi-parametric problem: Recall that DDP (hence also exact DDP) has two parameters and which means that the induced ordering if both parameters are left free is a partial order. For example, the following question has not natural answer: Is more, or less private than ?

To avoid incomparable results, we fix one parameter to its optimal value (i.e., ) and compare the other. We observe that fixing can yield a notion of (distributional) differential privacy which is in many cases cannot be instantiated.333For example, consider a simple mechanism which counts the number of ’s in the database, and a database where one row is fixed to . Then, it is not possible for to output a count of . When is fixed to zero in the DDP definition, is the ratio of probabilities of the the mechanism’s and the simulator’s output, both probabilities conditioned on a database row being fixed to some value. In our example one of these probabilities is zero, so no finite can satisfy this ratio. Thus, this simple mechanism does not satisfy -DDP for any . Therefore we choose to fix and compare mechanisms with respect to how good a they achieve. We remark that if we would choose , then Theorem 3.1 give us a negligible for outputting the histogram of votes—which means that any mechanism which depends on this histogram (like voting rules) have at most negligible . Hence, by choosing , we get a much more interesting asymptotic separation of values (for example, versus negligible in , as we see in Theorem 5.2). In addition, the choice yields a nice interpretation of exactness, since —which is now the distinguishing advantage of the DDP experiment—becomes the absolute value of the difference between probability of the mechanism’s output and the simulator’s output, in other words, their statistical distance.

We use eDDP to order standard voting rules, namely, those in the class of Generalized Scoring Rules (GSR) [28]. We study the privacy achieved of these mechanisms for two or more () candidates, as the number of votes () increases.

We will first study the case when the number of candidates . We consider the class of -majority rules (also called -biased plurality). For two candidates A and B, the -majority winner is A if fraction of the voters voted for A; otherwise, the winner is B. We note that -majority rules cover a wide range of commonly used voting rules for two candidates, including simple majority/plurality (), supermajority () and submajority ().

Theorem 1.1 (Informal, see Theorem 5.2)

Let each vote be iid, and let be the probability a voter chooses the first candidate and be the probability a voter chooses the second. The mechanism outputting the -biased plurality winner is -eDDP.

The above result highlights a separation of the privacy of the same mechanism, based on the database distribution. When the bias is equal to the probability , then . Otherwise, is negligible in the database size (number of votes) .

The analysis of the above uses a new technical tool we introduce, termed the trails technique. The trails technique can be used to compute -eDDP for mechanisms which depend only on the histogram (such as voting rules), and database distributions where rows are independent. To apply this technique we introduce an alternative definition of DDP which is equivalent to DDP in our setting, where database rows are independently distributed and the observer has no auxiliary information. A mechanism is -DDP in the alternative definition if the distributions and are close.

The idea of the trails techniques is as follows: Since depends only on the histogram, we can consider any subset of ’s range as a subset of histograms that are mapped to . We can thus split up computing this distance over any , into computing over disjoint trails. A trail is a set of histograms formed the following way: starting with histogram , subtract one from bin and add one to bin from the previous histogram, and repeat times. In other words, a set of the form . The main observation is that when each database row is independently distributed, for any trail, computing the distance over it cancels all but the probability of the first and last histogram in the trail. Then, computing -exact DDP is the same as computing the probabilities of the first and last histograms of a set of trails. We detail this technique in Section 5. We stress that the trail technique is not restricted to the two candidate case; in fact the technique is described for the multi-candidate setting and is used also in later sections. We believe that this technical tool is of independent interest, i.e, relevant for the analysis of exact privacy of also other mechanisms, beyond just voting.

Next, we consider the number of candidates . We begin by studying the exact privacy of outputting an entire histogram of votes, and the privacy of outputting a quantity related to plurality, the plurality score, which is a histogram of the top-ranked candidate in each vote. To do so, we show that outputting a histogram of any constant bins satisfies -eDDP Theorem 6.1). By setting for histogram of votes (where is the number of linear orders on candidates and is constant when is constant) and for plurality score, we show that both these mechanisms satisfy -eDDP.

We proceed to present our main two results, which prove the exact privacy of (a large subset of) GSRs [28]. The subset of GSRs we consider are the voting rules which satisfy the properties of monotonicity (we note this is not related to monotonically increasing utilities), canceling-out, and for which there exists a locally stable profile/histogram of votes. In the simple case of two candidates A and B, these are voting rules whose winner does not change even if one were to add both a vote for A and for a vote B, or if one were to raise the ranking of the winner in some votes. Moreover, a locally-stable profile is one where, if we only replace a few of the votes, the winner does not change.

Theorem 1.2 (Informal, see Theorem 6.2)

For any Generalized Scoring Rule which satisfies the properties of monotonicity, canceling-out, and for which there exists a locally stable profile, when each vote is iid:

  1. The mechanism that outputs its winner is -exact DDP, when each vote is also uniform over the set of linear orders over candidates.

  2. There exist database distributions whose Lebesgue measure can be arbitrarily close to , where the mechanism that outputs its winner is -DDP.

The above leads to the somewhat surprising conclusion that, when each vote is uniformly distributed, common voting rules have asymptotically comparable exact privacy

. As this is the privacy of the histogram mechanism, this means that the uniform distribution induces the worst case asymptotic privacy for voting rules. Moreover, it shows that for these GSR voting rules, the privacy differs asymptotically depending on the database distribution.

Finally, we apply our main result by showing a ranking of different voting rules based on concrete values. For each voting rule, we compute the

values for the mechanism which outputs the winner, and the distribution where each vote is uniformly distributed. We use linear regression (on the inverse square of these values) to fit them to

(where is our asymptotic theoretical bound). We find that fit is reasonable, with mean square error of 0.038 to 0.057, even for relatively few number of votes (). Based on the constant in the fitted result (where the larger the , the smaller the and the more private) the ranking from least to most private is: Borda, STV, Maximin, Plurality, and 2-approval. This ranking confirms our intuition that, for example, a voting rule which does not use the whole ranking in the vote (eg. Plurality, 2-approval) is more private than those which do. Yet it also informs us of the comparative information leakage between rules like Borda, STV, and Maximin, all of which use the entire ranking of the vote. We remark that STV is not in the subset of GSRs for which we have proven the asymptotic bound; however, its concrete ’s still fit well to .

1.2 Organization

The remainder of the paper is organized as follows. In Section 3 we compute the DDP parameters for the histogram mechanism. In Sections 4 we describe our definition of exactness and its relation with utility based optimality. Then, in Sections 5 and 6 we prove our main asymptotic results about eDDP for the case of two and arbitrary number of candidates, respectively. Finally, Section 7

includes our empirical estimations of the exact privacy parameters for the various rules used in this work. For space reasons several details and proofs have been moved to a clearly marked appendix. We have already discussed most relevant literature; for completeness in Appendix 

0.A we include a more detailed review of these works.

2 Preliminaries and Notation

In this section we present our notation, and give an overview of Generalized Scoring Rules. We use standard Differential Privacy (DP) [8] and Distributional Differential Privacy (DDP) definitions [1, 14]. For self-containment, we refer to Appendix LABEL:section:dp-ddp-defs for these definitions.

We call the set of values a row in the database can take the universe, denoted by . The set of all databases of any size (i.e. number of rows), is denoted , and the set of all databases of size is . For a universe of finite values, and constant probabilities , we denote by the distribution on , where the support is only in , and each of the rows is an independent and identically distributed (i.i.d.

) random variable which takes the value

with probability .

Let denote a set of alternatives (also called candidates). Let denote the set of all linear orders over , that is, all the set of antisymmetric, transitive, and total binary relations. Each vote is a linear order . Let denote the number of votes. In general, the profile is the database of votes in the set , but for the voting rules we consider, the profile is the histogram of all votes. We consider a voting rule to be a mapping that takes a profile as input and outputs a unique winner in .

For example, a positional scoring rule

is characterized by a scoring vector

with . For any candidate and any linear order , we let , where is the rank of in . Given a profile , the positional scoring rule chooses a candidate that maximize and break ties when multiple candidates have the highest score. Plurality, -approval (for any ), and Borda are positional scoring rules, with scoring vectors , , , respectively.

Another commonly used voting rules is the single transferable vote (STV) rule, which determines the winner in steps: in each step, the alternative ranked in the top positions least often is eliminated from the profile (and break ties when necessary), and the winner is the remaining candidate.

It turns out that many commonly studied voting rules belong to the class of Generalized Scoring Rules (GSRs).

Definition 1 (Generalized Scoring Rules (GSR))

A Generalized Scoring Rules (GSR) is defined by a number and two functions and , which maps any weak order over the to .
Given a vote , is the generalized score vector of . Given a profile , we let , called the score. Then, then winning candidate is given by , where is the function which outputs the weaker order of the components in .

We say that a rule is in GSR if it can be described by some , as above. Examples of GSR rule can be found in Appendix 0.B.1.

3 Noiseless Privacy in Voting

In this section, we demonstrate that under standard assumptions in the voting literature, noising the outcome is not necessary for achieving a reasonable notion of privacy. Concretely, we show that the histogram mechanism satisfies Distributional Differential Privacy (DDP) with good parameters (Theorem 3.1).

Theorem 3.1 (DDP of the histogram mechanism)

Let and . For all , there is a such that for all , the histogram mechanism is -DDP.

Proof

(sketch) Let and be the database distribution except for the th row, which has been set to . Since every row of is independent, is the same distribution as for any . Moreover, since every row is identically distributed, the choice of row is not important. On input a database missing the th row, we let the simulator guess the missing row as some , and apply the histogram mechanism to the resulting database. Since the distribution is independent of the value of missing row, the distribution of the simulator’s output is . The main idea here is that for any two sets and any value , the probability , is less than or equal to . For a choice of , we can show that for any value (which the simulator guesses to be the value of the missing row), the ratio . Then, we show by Chernoff bound that is exponentially small in , and let this be .

Like differential privacy, DDP is immune to post-processing 444Note that post-processing is not a property of all privacy definitions, such as exact privacy defined in the next section. (This is formally argued in Lemma 3 in Appendix 0.C). Thus, the result for histograms directly implies the same parameters for any mechanisms that only depend on the histogram of the database—for example, the voting rules we consider, which depend on the histogram of votes. This is formally stated (and proved) in Appendix 0.C (see Corollary 2).

4 Exact Privacy for Comparing Mechanisms

In this section we introduce a natural extension to privacy definitions, which we call exact privacy. We first study this notion with respect to Differential Privacy (DP), by presenting its relationship to the notion of utility. Then, we apply exact privacy to distributional differential privacy (DDP) so that we can study noiseless mechanisms.

Intuitively, a mechanism has exact privacy with parameters and if the mechanism cannot satisfy the privacy definition with strictly better parameters.

Definition 2 (Exact Distributional Differential Privacy (eDDP))

A mechanism is -Exact Distributional Differential Privacy (eDDP) if it is -DDP  and there does not exist nor such that is -DDP.

The above definition can easily be altered to define -exact differential privacy (eDP), by replacing each instance of DDP in definition with DP.

In order to better understand the use of exact privacy for comparing mechanisms it is useful to investigate the relationship between utility and exact DP. Informally, Theorem 4.1 says that when the utility is monotonically increasing with , then any optimal -DP mechanism is also -exact DP. We show an example of an exact DP mechanism by applying this theorem (Corollary 3). However, the converse of the theorem is not true—there exists an -exact DP mechanism that is not optimal for a monotonically increasing utility function (Lemma 5). We refer to Appendix 0.D for definitions and proofs.

Theorem 4.1 (Utility optimality implies Exact Privacy)

Let be any set of mechanisms and be any utility function monotonically increasing over . Then, an optimal (over ) -differentially private mechanism is -exact differentially private.

5 Comparing Voting Mechanisms: The Two-Candidate Case

In this section, we completely characterize exact distributional differential privacy (exact DDP) for two candidates under any biased majority rule w.r.t. any i.i.d. distribution. To this end, we first introduce an alternative definition of DDP which is equivalent to DDP in our setting (independent database rows, no auxiliary information, and ), then a technique called “trails” to bound the exact DDP under the alternative definition.

Definition 3 (Alternative Definition of DDP)

A mechanism is -DDP if for all , the following inequality is satisfied for any , any , and

The next lemma shows that when there is no auxiliary information, and when database rows are independently distributed, Definition 3 above is equivalent to (the simulation-based) DDP ([1], see Definition 8) up to parameter changes. In particular, when , the of the two definitions differs by a constant factor of two, and since our results are asymptotic, our results for eDDP in the alternative definition also holds for the simulation-based one.

Lemma 1

Suppose a mechanism is -(simulation-based) DDP, then is -DDP for Definition 3. Conversely, if is -DDP for Definition 3 then satisfies -(simulation-based) DDP.

In light of Definition 3, the -eDDP of a mechanism can be characterized by

(1)

However, the RHS of the equation is hard to bound. We address this challenge by breaking the RHS into a summation over multiple sets, each of which contains consecutive histograms and is called a trail. For any histogram , any and , we let denote the histogram .

Definition 4 (Trail)

Given a pair of data entries where , a histogram , and a length , we define the trail , where is called the direction of the trail, is called the entry of the trail, also denoted by , and is called the exit of the trail, denoted by .

Alternatively, a trail can be defined by its direction, entry , and exist .

Example 1

Figure 1 illustrates a trail for , where the direction is , the entry is , the exit is , and the length is .

Figure 1: Example of trail for .

To bound the RHS of (1), we will divide the histograms which maps to into trails, and use the following theorem about trails to simplify the RHS.

Theorem 5.1 (Trails)

Let be a trail with direction , and let be a database distribution where each row is independently distributed. For any , ,

We are now ready to characterize exact DDP for any majority rule for two candidates. Let . For any , let denote the biased majority rule that outputs when at least entries have value , and otherwise outputs . For any , let denote the distribution over the database with entries, where each entry is with probability independently.

Theorem 5.2 (Exact DDP for Majority Rules)

For any and any , is -eDDP, where

In particular, if and only if ; otherwise is exponentially small.

Proof

Let . Since there is no auxiliary information and , according to Definition 3, to bound for , we just need to bound the RHS of (1). We first gives an equivalent definition of using trails. Let trail and trail . It follows that is equivalent to for any . Also, Definition 3 implies that and . Therefore, continuing Equation (1), we have

(2)

We first discuss the case that (), where trail defined above starts at and exits at . For the term in , we have,

For the term in , we have,

Similarly, we have . It follows that

The case for is similar. We note that and the equation holds if and only if .

6 Comparing Voting Mechanisms: Two Candidates or More

In this section we characterize -exact DDP for arbitrary number of candidates . We start with the general case for the histogram mechanism with any constant number of bins . This would immediately imply that the same results for the histogram mechanism in the voting setting by letting (the number of possible linear orders on candidates).

Theorem 6.1 (Exact DDP of Histogram Mechanism)

For any , let

be a fixed nonzero probability distribution and let

. There exists such that for all , the histogram mechanism is -eDDP.

Proof (Sketch)

First we present the case for .

Lemma 2 (Exact DDP for Histogram, when )

Let . The histogram mechanism is -eDDP.

Proof

Let . Without loss of generality, we can let and (otherwise, rename them). Then, the maximizing set in equation 1) is exactly the set of histograms such that . Since our database distribution has iid rows, with support on the set of size

databases, the histogram follows the binomial distribution (with

trials). Below we find that is the set of histograms where .

The histograms where forms a trail which starts from and exits at . Thus,

(By Theorem 5.1)
(When one row is fixed to , the probability of histogram being is zero.)
(By applying Stirling’s formula)

We can generalize the result for , by using the trail technique, but for arbitrary number of bins. Again we assume WLOG that and . Let denote the number of items in the bin for . We observe that, when rows are i.i.d, the bins are independent of , conditioned on the sum . This means that we can compute for general number of bins , as a sum

Where is the -value for 2 bins, with database size . Using Chernoff bound we see that is concentrated at its mean . Using the result for , we get .

We now define a set of properties for GSRs to characterize their eDDP.

Definition 5 (Canceling-out, Monotonicity and Locally stability)

A voting rule satisfies canceling-out, if for any profile , adding a copy of every ranking does not change the winner. More precisely, .

A voting rule satisfies monotonicity if it is not possible to prevent a candidate from winning by raising its ranking in a vote while keeping the order of other candidates the same.

A profile is locally stable, if there exists an alternative , a ranking , and another ranking that is obtained from by raising the position of without changing the order of other alternatives, such that for any in the neighborhood of in terms of norm, we have (1) , and (2) the winner is when all votes in becomes votes.

To present the result, we first introduce an equivalent definition of GSR that is similar to the ones used in [29, 24].

Definition 6 (The definition of GSR)

A GSR over

alternatives is defined by a set of hyperplanes

and a function . For any anonymous profile , we let , where is the sign ( or ) of a number . We let the winner be .

That is, to determine the winner, we first use each hyperplane in

to classify the profile

, to decide whether is on the positive side (), negative side (), or is contained in the hyperplane (). Then is used to choose the winner from . We refer to this definition the definition.

Claim 1

The definition of GSR is equivalent to the definition of GSR in Definition 1.

We are now ready to present our theorem on GSRs that satisfy canceling-out, monotonicity, and local stability. We will characterize exact DDP under uniform distribution and give an exponential upper bound on DDP under some other distributions. For any pair of vectors and , we let .

Theorem 6.2 ((Exact) DDP for GSR)

Fix . Any mechanism outputting the winner of a GSR rule that satisfies canceling-out, monotonicity, and local stability is -eDDP  when each vote is i.i.d. and uniform over all linear orders on candidates. Moreover, for any vector of constant probabilities where component is the probability of the th (type of) vote, then is -DDP, where

The exponential upper bound in Theorem 6.2 applies to any distribution characterized by that is not on any hyperplane in GSR. Notice that the Lebesgue measure of distributions that is contained in any hyperplane is . It follows that the upper bound on is exponentially small for any distribution in a closed set that does not intersect with any hyperplane in the GSR, whose Lebesgue measure can be arbitrarily close to .

Proposition 1

All positional scoring rules and all Condorcet consistent and monotonic rules satisfy the conditions described in Theorem 6.2.

Corollary 1

Plurality, veto, -approval, Borda, maximin, Copeland, Bucklin, Ranked Pairs, Schulze are -eDDP  when only the winner is announced, where each vote is i.i.d. and uniform over linear orders on a constant number of candidates.

STV is not one of them. However, empirical results (Section 7) suggest that STV is likely also -eDDP for this distribution.

7 Concrete Estimation of the Privacy Parameters

In this section we compute the concrete -exact DDP values for several voting rules in the class of Generalized Scoring Rules (GSR). Recall that all voting rules in GSR are characterized by functions . On any profile (histogram of votes) , the vector is the score, and the output of on the score is the winner. We refer to [28] for the and functions of different rules in GSR. The table below shows concrete -exact distributional differential privacy (exact DDP) values, using the database distribution where each row is iid and uniform over the set of linear orders on three candidates. We compute concrete values for database sizes . Then, we fit them to (our theoretical bound is ), using linear regression to find the specific , values for each voting rule. We rank from least to most private, by the value for outputting the winner. The larger the , the smaller the value and more private. The resulting ranking is: Borda, STV, Maximin, Plurality, and 2-approval.

Rule \ Observable Winner Score
Borda
STV
Maximin
Plurality
2-approval
Table 1: Concrete -exact DDP values for outputting score and winner of various voting rules. Here we let each row of the database to be independently and uniformly distributed over linear orders on three candidates.

In Figure 2 below, we show the comparison between Plurality, Borda, and STV voting rules. As expected, outputting just the winner is more private than outputting the score (more easily seen in Figure 3). Moreover, Plurality, a rule which only depends on the top-ranked candidate in each vote, is more private (leaks less information) than a rule like Borda, which uses the entire ranking. We remark that when we use the GSR winner to rank the privacy of voting rules, then STV is more private that Borda, but the opposite is true if we instead use the GSR score. We do not use the GSR score to compare the privacy of voting rules. This is because rules like STV and Maximin, the score has the same privacy as simply outputting the entire histogram of votes, and 2-approval would have the same privacy as plurality. Thus, mechanism outputting the winner allows for a better comparison.

Figure 2: -exact DDP for GSR winner and score, as a function of the size of the database (the number of votes).
Figure 3: Plurality score vs winner

In Table 2 we show the mean square error between the concrete values for the mechanism outputting the winner, and the the fitted curve in Table 1.

Rule \ Error Mean Square Error
Borda 0.0566844201243
STV 0.0542992943035
Maximin 0.0377631805983
Plurality 0.0477175838906
2-approval 0.0454223047191
Table 2: Mean square error between concrete privacy parameter for outputting the winner, and the fitted curve in Table 1.

References

  • [1] Raef Bassily, Adam Groce, Jonathan Katz, and Adam Smith. Coupled-worlds privacy: Exploiting adversarial uncertainty in statistical data privacy. FOCS, pages 439–448, 2013.
  • [2] Raef Bassily and Adam Smith. Local, Private, Efficient Protocols for Succinct Histograms. STOC, pages 127–135, 2015.
  • [3] Raghav Bhaskar, Abhishek Bhowmick, Vipul Goyal, Srivatsan Laxman, and Abhradeep Thakurta. Noiseless Database Privacy. Asiacrypt, 7073:215–232, 2011.
  • [4] Eleanor Birrell and Rafael Pass. Approximately strategy-proof voting.

    IJCAI International Joint Conference on Artificial Intelligence

    , pages 67–72, 2011.
  • [5] Avrim Blum, Katrina Ligett, and Aaron Roth. A learning theory approach to noninteractive database privacy. STOC, 60(2):1–25, 2008.
  • [6] Ioannis Caragiannis, Ariel D. Procaccia, and Nisarg Shah. Modal Ranking: A Uniquely Robust Voting Rule. In Proceedings of the 28th AAAI Conference on Artificial Intelligence, 2014.
  • [7] Yitao Duan. Privacy without Noise. Cikm, pages 1517–1520, 2009.
  • [8] Cynthia Dwork. Differential Privacy. Proceedings of the International Colloquium on Automata, Languages and Programming, Part II (ICALP), pages 1–12, 2006.
  • [9] Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. Our Data, Ourselves: Privacy Via Distributed Noise Generation. Proceedings of EUROCRYPT’06, 4004:486–503, 2006.
  • [10] Cynthia Dwork, F. McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. TCC, pages 265–284, 2006.
  • [11] Cynthia Dwork and Aaron Roth. The Algorithmic Foundations of Differential Privacy. Foundations and Trends in Theoretical Computer Science, 9(2013):211–407, 2014.
  • [12] Arpita Ghosh, Tim Roughgarden, and Mukund Sundararajan. Universally utility-maximizing privacy mechanisms. STOC, 41(6):351, 2009.
  • [13] William H Greene. Econometric analysis. Pearson Education India, 2003.
  • [14] Adam Groce. New Notions and Mechanisms for Statistical Privacy. 2014.
  • [15] Rob Hall, Alessandro Rinaldo, and Larry Wasserman. Random Differential Privacy. Journal of Privacy and Confidentiality, 4(2):43–59, 2012.
  • [16] Moritz Hardt and Kunal Talwar. On the Geometry of Differential Privacy. STOC 10, pages 705–714, 2010.
  • [17] M. Hay, L. Elagina, and G. Miklau. Differentially private rank aggregation. Proceedings of the 17th SIAM International Conference on Data Mining, SDM 2017, 2017.
  • [18] Roger A Horn and Charles R Johnson. Matrix analysis. Cambridge university press, 1990.
  • [19] SP Kasiviswanathan and Adam Smith. A note on differential privacy: Defining resistance to arbitrary side information. (x):1–11, 2008.
  • [20] David T. Lee. Efficient, private, and e-strategy proof elicitation of tournament voting rules. IJCAI International Joint Conference on Artificial Intelligence, 2015-Janua:2026–2032, 2015.
  • [21] Samantha Leung and Edward Lui. Bayesian mechanism design with efficiency, privacy, and approximate truthfulness. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 7695 LNCS(Wine):58–71, 2012.
  • [22] Samantha Leung and Edward Lui. Bayesian mechanism design with efficiency, privacy, and approximate truthfulness. International Workshop on Internet and Network Economics, 7695 LNCS:58–71, 2012.
  • [23] Frank McSherry and Kunal Talwar. Mechanism Design via Differential Privacy. FOCS, pages 94–103, 2007.
  • [24] Elchanan Mossel, Ariel D. Procaccia, and Miklos Z. Racz. A Smooth Transition From Powerlessness to Absolute Power. Journal of Artificial Intelligence Research, 48(1):923–951, 2013.
  • [25] Shang Shang, Tiance Wang, Paul Cuff, and Sanjeev Kulkarni. The Application of Differential Privacy for Rank Aggregation: Privacy and Accuracy. Information Fusion (FUSION), 2014, (1):1–7, 2014.
  • [26] James M Varah.

    A lower bound for the smallest singular value of a matrix.

    Linear Algebra and its Applications, 11(1):3–5, 1975.
  • [27] Lirong Xia. Generalized Decision Scoring Rules : Statistical , Computational , and Axiomatic Properties. ACM, pages 661–678, 2015.
  • [28] Lirong Xia and Vincent Conitzer. Generalized scoring rules and the frequency of coalitional manipulability. Proceedings of the 9th ACM conference on Electronic commerce - EC ’08, page 109, 2008.
  • [29] Lirong Xia and Vincent Conitzer. Finite Local Consistency Characterizes Generalized Scoring Rules. IJCAI, pages 336–341, 2009.

Appendix 0.A Related Literature

The first works on DP described how one can create mechanisms for answering standard statistical queries on a database (e.g., number of records with some property or histograms) in a way that satisfies the DP definition. This ignited a vast and rapidly evolving line of research on extending the set of mechanisms and achieving different DP guarantees—we refer the reader to [11] for an (already outdated) survey—to a rich literature of relaxations to the definition, e.g.,  [3, 22, 7, 1], that capture among others, noiseless versions of privacy, as well as works studying the trade-offs between privacy and utility of various mechanisms [23, 5, 16, 2, 12].

Generalized Scoring Rules (GSRs) is a class of voting rules that include many commonly studied voting rules, such as Plurality, Borda, Copeland, Maximin, and STV [28]

. It has been shown that for any GSR the probability for a group of manipulators to be able to change the winner has a phase transition 

[28, 24]. An axiomatic characterization of GSRs is given in [29]. The most robust GSR with respect to a large class of statistical models has been characterized [6]. Recently GSRs have been extended to an arbitrary decision space, for example to choose a set of winners or rankings over candidates [27].

Differential privacy and applications to voting

Differential privacy [8] was recently used to add privacy to voting. Shang et al. [25] applied Gaussian noise to the histogram of linear orders, while Hay et al. [17] used Laplace and Exponential mechanisms applied to specific voting rules. Lee [20] also developed a method of random selection of votes to achieve differential privacy. [25, 17] also address the accuracy/privacy tradeoff of their DP mechanisms. An aspect of noising the output that was observed in [4, 20] is that it enables an approximate strategy-proofness; the idea here is that the added noise dilutes the effect of any individual deviation, thereby making strategies which would slightly perturb the outcome irrelevant. We remark that if one wishes to achieve DP for a large number of voting rules, well-known DP mechanisms (like adding Laplace noise [10]) can be applied to rules in GSR in a straightforward way, by adding noise to each component of the score vector and outputting the winner based on the noised score vector. However, here we focus on exact privacy of noiseless mechanisms.

Utility of mechanisms

A number of works has explored the utility of mechanisms that achieve a particular level of differential privacy. In [23], the utility is an arbitrary user-defined function, used in the exponential mechanism. The works of [5, 16, 2] define utility in terms of error, where the closer (by some metric) the output of the mechanism is from the query’s, the higher the utility. The definition of [12] in addition allows the user to define a prior distribution on the query output.

Relaxations to Differential Privacy and Noiseless mechanisms

Relaxations to differential privacy have been proposed to allow mechanisms with less to no noise to achieve a DP-style notion of privacy. Kasiviswanathan and Smith [19] formally proved that differential privacy holds in presence of arbitrary adversarial information, and formulated a Bayesian definition of differential privacy which makes adversarial information explicit. Hall et al. [15] suggested noising only certain values (such as low-count components in histograms) to achieve a relaxed notion of Random Differential Privacy with higher accuracy.Taking advantage of (assumed) inherent randomness in the database, several works have also put forward DP-style definitions which allow for noiseless mechanisms. Duan [7] showed that sums of databases with iid rows can be outputted without noise. Bhaskar et al. [3] introduced Noiseless Privacy for database distributions with iid rows, whose parametres depend on how far the query is from a function which only depends on a subset of the database. Motivated by Bayesian mechanism design, Leung and Lui [21], suggested noiseless sum queries and introduced Bayesian differential privacy for database distributions with independent rows, where the auxiliary information is some number of revealed rows. This idea was generalized and extended by Bassily et al. who introduced distributional differential privacy (DDP) [1, 14] which captures both the randomness of the database and possible adversary auxiliary information. Informally, given a distribution on both databases and the auxiliary information leaked, we say a mechanism is -DDP if its output distribution can be simulated by a simulator that is given the database missing one row. In these works, noiseless mechanisms which have been shown to satisfy DDP are exact sums, truncated histograms, and stable functions where with large probability, the output is the same given neighboring databases.

Appendix 0.B Preliminaries (Cont’d)

Two databases are neighbors if they differ in exactly one row555“Differ in one row” can mean two things: 1. (Unbounded differential privacy) Database can be obtained from database by adding or removing one row, or 2. (Bounded differential privacy) can be obtained from by changing value in exactly one row.. Informally, a mechanism ’s differential privacy measures the maximal distance between distributions and , for any two neighbors and . Intuitively, a row in the database represents one individual’s data, so differential privacy describes how much the output is perturbed, if one person has changed (or removed/added) his or her information.

Definition 7 (Differential Privacy (DP) [11])

A mechanism is -differentially private (DP) if for all neighboring databases , and ,

Since the databases and have no randomness, a mechanism must be randomized to satisfy differential privacy for any non-trivial parameters. In contrast, relaxations to DP like Distributional Differential Privacy (DDP) below have allowed for noiseless mechanisms by letting the be a random variable. In this definition, the two neighboring databases are replaced by two “neighboring distributions”, one with all the database rows (), and one where the th row is removed (). Both these distributions are conditioned on the th row being fixed to some value . Intuitively, the goal of the simulator is to take the database missing the th row, and emulate what would output on the database including its th row.

Definition 8 (Distributional Differential Privacy (DDP) [1])

A mechanism is -distributional differentially private (DDP) if there is a simulator such that for all , for all , (where denotes the support of a distribution), and all sets ,

and

As we will see in Lemma 1, in the case without auxiliary information, and when each row in the database is independently distributed, this definition is equivalent to the case where guesses the missing row and applies the mechanism on the resulting database (up to difference of in the privacy parameter, see Theorem 1).

0.b.1 Examples of GSR

For example, any positional scoring rule with scoring vector is a GSR, where and the corresponding and are defined as follows. For any and any , is the score of candidate in , that is, . selects the candidate that corresponds to the largest component in (and uses a tie-breaking mechanism when necessary).

STV is also a GSR with exponentially large . For every proper subset of and every candidate not in , there is a component in the generalized score vector that contains the number of times that is ranked first if all alternatives in are removed. Let ; the coordinates are indexed by , where is a proper subset of and . Let , if after removing from , is at the top of the modified ; otherwise, . Then, mimics the process of STV to select a winner.

Appendix 0.C Noiseless Privacy in Voting (Cont’d)

Proof (Theorem 3.1, The histogram mechanism satisfies DDP)

The proof is similar to Theorem 8 of [21].

Let . Since database rows are i.i.d., we simplify