How to learn, while ensuring that a spying adversary does not learn? Enabled by rapid advancements in the Internet, surveillance technologies and machine learning, companies or governments alike have become increasingly capable of monitoring the behavior of individuals, consumers and competitors, and use such data for inference and prediction. Motivated by these developments, the present paper investigates the extent to which it is possible for a learner to protect her knowledge from an adversary who observes, completely or partially, her actions. Furthermore, we are interested in how much additional effort is required for the learner to retain such privacy, and conversely, from the adversary’s perspective, what kinds of statistical inference algorithms are most effective against a potentially privacy-conscious subject.
Concretely, we approach these questions by studying the query complexity of Bayesian Private Learning, a framework proposed by xu2017private and tsixuxu2017 to investigate the privacy-efficiency trade-off in active learning. Our main result is a tight lower bound on query complexity, showing that there will be a price to pay for the learner in exchange for improved privacy, whose magnitude scales multiplicatively with respect to the level of privacy desired. In addition, we provide a family of inference algorithms for the adversary, based on proportional sampling, which is provably effective in estimating the target against any learner who does not employ a large number of queries.
Among our chief motivations are applications that involve protecting the privacy of consumers or firms against increasingly powerful surveillance and data analysis technologies; these applications are discussed in Section 3.
1.1 The Model: Bayesian Private Learning
We begin by describing the Bayesian Private Learning model formulated by xu2017private and tsixuxu2017. A learner is trying to accurately identify the location of a random target, , up to some constant additive error, , where
is uniformly distributed in the unit interval,. The learner gathers information about by submitting queries
for some . For each query, , she receives a binary response, indicating the target’s location relative to the query:
where denotes the indicator function. Viewed from the angle of optimization, the target can be thought of as the minimum of a convex function with unknown parameters, and the responses will correspond to the signs of the function’s gradients at the queried points.
The learner submits the queries in a sequential manner, and subsequent queries may depend on previous responses. Once all queries are submitted, the learner will produce an estimator for the target. The learner’s behavior is formally captured by a learner strategy, defined as follows.
[Learner Strategy] Fix . Let
be a uniform random variable over, independent from other parts of the system; will be referred to as the random seed. A learner strategy, , consists of two components:
Querying mechanism: , is a sequence of deterministic functions, where takes as input past responses and the random seed, , and generates the next query, i.e.,222Note that the query does not explicitly depend on previous queries, , but only their responses. This is without the loss of generality, since for a given value of it is easy to see that can be reconstructed once we know their responses and the functions .
where denotes the responses from the first queries: , and .
Estimator: is a deterministic function that maps all responses, , and to a point in the unit interval that serves as a “guess” for :
will be referred to as the learner estimator.
We will use to denote the family of learner strategies that submit queries.
The first objective of the learner is to accurately estimate the target, as is formalized in the following definition. [-Accuracy] Fix . A learner strategy, , is -accurate, if its estimator approximates the target within an absolutely error of almost surely, i.e.,
where the probability is measured with respect to the randomness in the target, , and the random seed, .
We now introduce the notion of privacy: in addition to estimating , the learner would like to simultaneously conceal from an eavesdropping adversary. Specifically, there is an adversary who knows the learner’s query strategy, and observes all of the queries but not the responses. The adversary then uses the query locations to generate her own adversary estimator for , denoted by , which depends on the queries, , and any internal, idiosyncratic randomness.
With the adversary’s presence in mind, we define the notion of a private learner strategy.
[-Privacy] Fix and . A learner strategy, , is -private if, for any adversary estimator, ,
where the probability is measured with respect to the randomness in the target, , and any randomness employed by the learner strategy and the adversary estimator.333This definition of privacy is reminiscent of the error metric used in Probably Approximately Correct (PAC) learning (valiant1984theory), if we view the adversary as trying to learn a (trivial) constant function to within an error of with a probability great than .
In particular, if a learner employs a -private strategy, then no adversary estimator can be close to the target within an absolute error of with a probability great than . Therefore, for any fixed , the parameter can be interpreted as the level of desired privacy.
We are now ready to define the main quantity of interest in this paper: query complexity. Fix and in , and . The query complexity, , is the least number of queries needed for an -accurate learner strategy to be -private:
[Connections to Active Learning] Cast in the terminology of active learning, the problem facing the learner is equivalent to that of learning a threshold function, , with the threshold : , . The response is simply the value of the function evaluated at : . A learner strategy is -accurate if the leaner is able to produce a threshold function such that
1.2 Notation and Convention
We will use the asymptotic notation to mean that is on the order of : as approaches a certain limit. All logarithmic functions used in this paper will be with base . To avoid excessive use of floors and ceilings, we will assume the values of , and are integral powers of , so that their corresponding logarithmic expressions always assume an integral value. When referring to an interval that belongs to a partition of , we will use the term “sub-interval” to distinguish it from the unit interval itself; the same is true with the term “sub-cube” in higher dimensions. We use and as a short-hand for and , respectively.
2 Main Result
The main objective of the paper is to understand how varies as a function of the input parameters, , and . Our result will focus on the regime of parameters where
Having corresponds to a setting where the learner would like to identify the target with high accuracy, while the adversary is aiming for a coarser estimate; the specific constant is likely an artifact of our analysis and could potentially be improved to being closer to . Note that the regime where is arguably much less interesting, because it is not natural to expect the adversary, who is not engaged in the querying process, to have a higher accuracy requirement than the learner. The requirement that stems from the following argument. If , then the adversary can simply draw a point uniformly at random in and be guaranteed that the target will be within with a probability greater than . Thus, the privacy constraint is automatically violated, and no private learner strategy exists. To obtain a nontrivial problem, we therefore need only to consider the case where .
The following theorem is our main result. The upper bound has appeared in xu2017private and tsixuxu2017 and is included for completeness; the lower bound is the contribution of the present paper.
[Query Complexity of Bayesian Private Learning] Fix and in and , such that and . The following is true.
Both the upper and lower bounds in Theorem 2 are constructive, in the sense that we will describe a concrete learner strategy that achieves the upper bound, and an adversary estimator that forces any learner strategy to employ at least as many queries as that prescribed by the lower bound.
If we apply Theorem 2 in the regime where and stay fixed, while the learner’s error tolerance, , tends to zero, we obtain the following corollary in which the upper and lower bounds on query complexity coincide.
Fix and , such that . Then,
Note that the special case of corresponds to when the learner is not privacy-constrained and aims to solely minimize the number of queries. Theorem 2 and Corollary 2 thus demonstrate that there is a hefty price to pay in exchange for privacy, as the query complexity depends multiplicatively on the level of privacy, .
Deriving the query complexity formulae in Theorem 2 is not the only objective of our inquiry. As will become clearer, the proof of Theorem 2 will lead us to discovering the surprising efficacy of certain, seemingly naive, adversary estimators based on proportional sampling, and through them, we will obtain new insights into the problem’s strategic dynamics. Consequently, we will be able to better answer questions such as: what are the key features that make it hard to conceal (learned) information in a sequential learning problem? How should the adversary take advantage of these features when designing inference algorithms? A key concept along this direction, that of information-action locality, will be further explored in Section 8.1.
[Noisy vs. Noiseless Responses] Our model assumes that the responses, , are exact (Eq. (2)), in contrast to some of the noisy response models in the literature (e.g., rivest1980coping; ben2008bayesian; waeber2013bisection), where, for instance, the true responses are flipped with a positive probability. We observe that the query complexity lower bound in Theorem 2 automatically applies to the noisy setting (because a learner’s strategy can simulate noisy query responses by artificially perturbing the true responses), while the upper bound doesn’t (because it would require a different set of private learner strategies). We focus on the noiseless model because it is an important baseline that allows us to streamline the analysis and brings to the fore key insights. That being said, generalizing our results to a noisy query model can be an interesting and practically relevant direction of future research; it is discussed further in Section 9.
3 Motivating Applications
While our model is stylized, it is designed to capture fundamental privacy vs. complexity tradeoffs inherent in applications of sequential learning. We examine below two such motivating applications:
Example 1: Protecting consumer privacy. Consider a consumer (the learner) who is browsing an online retailer site in search of an item with an “ideal” one-dimensional feature value (the target), such as its size, brightness, or color tone. While the consumer does not directly know what the ideal value is, when presented with an item, she is able to articulate the “impression” as to whether the current item’s feature value is too large or to small. Guided by these impressions, the consumer browses different items in a sequential manner to eventually narrow down the ideal item. During this process, the online retailer observes all of the items viewed by the consumer along with their feature values, but does not know the impressions perceived by the consumer. Can the consumer conduct the search in such a way that does not reveal to the retailer the ideal item that she ultimately intends to purchase? (One can imagine that such information would put the consumer in a disadvantageous position for various reasons.) How many additional items should the consumer browse, and in what manner, should she wish to achieve successful obfuscation against the retailer’s inference algorithms?
Example 2: Privacy-aware price learning (tsixuxu2017). Consider a firm (the learner) is in the process of launching a new product, and would like to price the product in a way that maximizes total profit. The profit, , is a concave function of the price, , and since the parameters of are unknown, the firm believes the profit maximizing price, , to be uniformly distributed in a certain interval, . To identify (corresponding to the target), the firm proceeds to test the sensitivity of the profit function at a sequence of price points using costly surveys or experiments; the test prices correspond to the queries and the resulting sensitivities the responses. Because the testing prices may be easily obtained or even public, the firm is concerned that a competitor (the adversary) who observes the testing prices will be able to predict and price their competing products accordingly, which would be detrimental to the firm. The firm would thus like to minimize the number of testing prices, while ensuring that remains unpredictable for the competitors.
Finally, let us be reminded that Bayesian Private Learning is a more general model that contains, as a special case (), the classical problem of sequential learning with binary feedback. The latter has a wide range of applications in statistics (robbins1951stochastic), information theory (horstein1963sequential) and optimization (waeber2013bisection), and as a more general model Bayesian Private Learning inherits these applications as well.
4 Related Literature
Bayesian Private Learning is a variant of the so-called Private Sequential Learning problem. Both models were formulated in xu2017private and tsixuxu2017, and the main distinction between the two is that the target is drawn randomly in Bayesian Private Learning, while it is chosen in a worst-case fashion (against the adversary) in the original Private Sequential Learning model. xu2017private and tsixuxu2017 establish matching upper and lower bounds on query complexity for Private Sequential Learning. They also propose the Replicated Bisection algorithm as a learner strategy for the Bayesian variant, but without a matching query complexity lower bound. The present paper closes this gap.
The two formulations indeed differ in crucial ways. The worst-case assumption in the original model imposes a more stringent criterion for the adversary: she would have to ensure a certain probability of correct estimation for any realization of the target (more risk-averse). In contrast, the adversary in the Bayesian version only has to do so on average (more risk-neutral). This distinction further leads to very different query complexities: the query complexity in the worst-case formulation was shown to be around (xu2017private; tsixuxu2017), whereas we show that the query complexity in the Bayesian setting is approximately (Theorem 2). That is, the privacy requirement demands significantly greater efforts from the learner in the Bayesian version, from being additive to multiplicative in the level of privacy, . Finally, the proof techniques for establishing lower bounds in the two settings also diverge: the arguments employed by xu2017private and tsixuxu2017 are combinatorial in nature, whereas our proof relies on information-theoretic tools.
At a higher level, our work is connected to a growing body of literature on privacy-preserving mechanisms, in computer science (dwork2014algorithmic; lindell2009secure; fanti2015spy), operations research (cummings2016empirical; tsixu2018)chaudhuri2011differentially; jain2012differentially; wainwright2012privacy). Beyond the more obvious divergence in models and applications, we highlight below some conceptual differences between Bayesian Private Learning and the extant literature.
1. Goal-Oriented vs. Universal: this is the most distinguishing feature of our privacy framework. Our notion of privacy is goal-oriented, defined with respect to the adversary’s (in)ability to perform a specific statistical inference task. Notably, this is in contrast to the (much more stringent) universal privacy criteria, among which Differential Privacy (e.g., dwork2014algorithmic) is a well-known paradigm, where the output distribution of a mechanism is supposed to be insensitive with respect to any perturbation in the input, and hence preventing an adversary from performing any inference task. A key consequence of the context-dependent formulation is that the decision strategy can be tailored to the adversary’s inference task, and hence more efficient; in contrast, universal private requires the system designer to defend against a wider range of possible adversaries, and thus restricts the designer to using only conservative and inefficient strategies. The difference between the context-dependent and universal privacy is further explored in Appendix B: we show that an -private Replicated Bisection strategy is never differentially private, thus demonstrating that the latter is a strictly more restricting privacy criterion.
2. Concealing the Unknown vs. the Known: the object to be concealed in our model is initially unknown even to the decision maker herself, and must be actively learned. This is in contrast to privacy models where the decision maker knows in advance the information to be concealed: examples of such information include the rumor source in the anonymous rumor spreading problem of fanti2015spy, or the goal vertex in the Goal Prediction game of tsixu2018.
3. Sequential vs. One-shot: we focus on sequential and dynamic learning, as opposed to one-shot or static problems (e.g., gupta2012iterative).
4. Decision-centric vs. Data-centric: we focus on the behavior and actions of a privacy-aware decision maker, as opposed to anonymizing a data set (gasarch2004survey; dwork2008differential; gupta2012iterative). The obfuscation measures in a decision problem can be significantly more limited than those in privatizing a data set: one is only able to modify which actions are chosen and not how they are observed, whereas a data release algorithm could inject richer noises or fudge data entries. This is an inherent byproduct of the fact that the decision maker often has to use the actions as a means to acquire information or achieve a goal, and sometimes reveals the actions directly to the adversary (e.g., a search engine or data provider), rendering injecting arbitrary noise virtually impossible.
On the methodological front, our proof uses Fano’s inequality, an essential tool for deriving lower bounds in statistics, information theory, and active learning (cover2012elements). The proportional sampling estimators that we analyze are reminiscent of the reward-matching policy studied by xu2016reinforcement
and, more broadly, the so-called Luce’s rule in reinforcement learning(luce1959individual), where actions are chosen with probabilities proportional to the amount of associated rewards; these policies are known to perform well in repeated games (erev1998predicting). However, to the best of our knowledge, both proportional-sampling estimators and the use of Fano’s inequality have received relatively little attention in the context of private sequential learning.
Finally, we remark that recently, building on the conference version of the present manuscript (xu2018query), the authors of xu2019optimal were able to use more intricate analysis and a variant of the proportional sampling estimator proposed in this paper, which they termed the truncated proportional sampling estimator, to established a pair of improved query complexity upper and lower bounds that are tight up to an additive factor of .
5 Proof Overview
The next two sections are devoted to the proof of Theorem 2. We first give an overview of the main ideas. Let us begin by considering the special case of , where learner is solely interested in finding the target, , and not at all concerned with concealing it from the adversary. Here, the problem reduces to the classical setting, where it is well-known that the bisection strategy achieves the optimal query complexity (waeber2013bisection). The bisection strategy recursively queries the mid-point of the interval which the learner knows to contain . For instance, the learner would set , and if the response , then she will know that lies in the interval , and set to ; otherwise, will be set to . This process repeats for steps. Because the size of the smallest interval known to contain is halved with each additional query, this yields the query complexity
Unfortunately, once the level of privacy increases above , the bisection strategy is almost never private: it is easy to verify that if the adversary sets to be the learner’s last query, , then the target is sure to be within a distance of at most . That is, the bisection strategy is not -private for any , whenever . This is hardly surprising: in the quest for efficiency, the bisection strategy submits queries that become progressively closer to the target, thus rending its location obvious to the adversary.
Building on the bisection strategy, we arrive at a natural compromise: instead of a single bisection search over the entire unit interval, we could create identical copies of a bisection search across disjoint sub-intervals of that are chosen ahead of time, in a manner that makes it impossible to distinguish which search is truly looking for the target. This is the main idea behind the Replicated Bisection strategy, first proposed and analyzed in tsixuxu2017. We examine this strategy in Section 6, which will yield the query-complexity upper bound, on the order of .
Proving the lower bound turns out to be more challenging. To show that the query complexity is at least, say, , we will have to demonstrate that none of the learner strategies using queries, , can be simultaneously private and accurate. Because the sufficient statistic for the adversary to perform estimation is the posterior distribution of the target given the observed queries, a frontal assault on the problem would require that we characterize the resulting target posterior distribution for all strategies, a daunting task given the richness of , which grows rapidly as increases.
Our proof will take an indirect approach. The key idea is that, instead of allowing the adversary to use the entire posterior distribution of the target, we may restrict her to a seemingly much weaker class of proportional-sampling estimators, where the estimator is sampled from a distribution proportional to the empirical density of the queries. A proportional-sampling estimator would, for instance, completely ignore the order in which the queries are submitted, which may contain useful information about the target. We will show that, perhaps surprisingly, the proportional-estimators are so powerful that they leave the learner no option but to use a large number of samples. This forms the core of the lower bound argument.
Studying the proportional-sampling estimators has additional benefits. From a practical perspective, they are constructive estimators that are extremely easy to implement and yet guaranteed to provide good estimation accuracy against any learner strategy that uses few queries. More importantly, their structure avails us with deeper insights into a fundamental dilemma the learner faces: in order to acquire a sufficient amount of information to locate the target accurately, a significant portion of the learner’s actions (queries) must be spatially close to the said target. It is precisely this action-information proximity that the proportional-sampling estimator exploits. The concept of action-information proximity will be explored more in depth in Section 8.1.
The proof of the lower bound will be presented in Section 7, consisting of the following steps.
1. Discrete Private Learning (Section 7.1). We formulate a discrete version of the original problem where both the learner and adversary estimate the discrete index associated with a certain sub-interval that contains the target, instead of the continuous target value. The discrete framework is conceptually clearer, and will allow us to deploy information-theoretic tools with greater ease.
2. Localized Query Complexity (Section 7.2). Within the discrete version, we prove a localized query complexity result: conditioning on the target being in a coarse sub-interval of , any accurate learner still needs to submit a large number of queries within the said sub-interval. The main argument hinges on Fano’s inequality and a characterization of the conditional entropy of the queries and the target.
3. Proportional-Sampling Estimator (Section 8.1). We use the localized query complexity in the previous step to prove a query complexity lower bound for the discrete version of Bayesian Private Learning. This is accomplished by analyzing the performance of the family of proportional-sampling estimators, where the adversary reports index of a sub-interval that is sampled randomly with probabilities proportional to the number of learner queries each sub-interval contains. We will show that the proportional-sampling estimator will succeed with overwhelming probability whenever an accurate learner strategy submits too few queries, thus obtaining the desired lower bound. In fact, we will prove a more general lower bound, where the learner can make mistakes with a positive probability.
4. From Discrete to Continuous (Section 7.4). We complete the proof by connecting the discrete version back to the original, continuous problem. Via a reduction argument, we show that the original query complexity is always bounded from below by its discrete counterpart with some modified learner error parameters, and the final lower bound will be obtained by optimizing over these parameters. The main difficulty in this portion of the proof is due to the fact that an accurate continuous learner estimator is insufficient for generating an accurate discrete estimator that is correct almost surely. We will resolve this problem by carefully bounding the learner’s probability of estimation error, and apply the discrete query lower bound developed in the previous step, in which the learner is allowed to make mistakes.
6 The Upper Bound
We prove the upper bound of Theorem 2. The bound has appeared in xu2017private and tsixuxu2017, which proposed, without a formal proof, the Replicated Bisection learner strategy that achieves -privacy with queries. For completeness, we first review the Replicated Bisection strategy and subsequently give a formal proof of its privacy and accuracy. The main idea behind Replicated Bisection is to create identical copies of a bisection search in a strictly symmetrical manner so that the adversary wouldn’t be able to know which one of the searches is associated with the target. The strategy takes as initial inputs and , and proceeds in two phases:
Phase 1 - Non-adaptive Partitioning. The learner submits (non-adaptive) queries:
Adjacent queries are separated by a distant of , and together they partition the unit interval into disjoint sub-intervals of length each. We will refer to the interval as the th sub-interval. Because the queries in this phase are non-adaptive, after the first queries, while the learner knows which sub-interval contains the target, , the adversary has gained no information about . We will denote by the sub-interval that contains .
Phase 2 - Replicated Bisection. The second phase further consists of a sequence of rounds, . In each round, the learner submits one query in each of the sub-intervals, and the location of the said query relative to the left end of the sub-interval is the same across all sub-intervals. Crucially, in the th round, the query corresponds to the th step in a bisection search carried out in the sub-interval , which contains the target. The rounds continue until the learner has identified the location of with sufficient accuracy within . The queries outside of serve only the purpose of obfuscation by maintaining a strict symmetry. Figure LABEL:fig:pseuRBS contains the pseudo-code for Phase 2.
Denote by the last query that the learner submits in the sub-interval in Phase , and by its response. It follows by construction that either and , or and . Therefore, the learner can produce the estimator by setting to the mid point of either or , depending on the value of , and this guarantees of an additive error of at most . We have thus shown that the Replicated Bisection strategy is -accurate. The following result shows that it is also private; the proof is given in Appendix A.1.
Fix and in and , such that and . The Replicated Bisection strategy is -private.
Finally, we verify the number of queries used by Replicated Bisection: the first phase employs queries, and the second phase uses queries per round, across rounds, leading to a total of queries. This completes the proof of the query complexity upper bound in Theorem 2.
7 The Lower Bound
7.1 Discrete Bayesian Private Learning
We begin by formulating a discrete version of the original problem, where the goal for both the learner and the adversary is to recover a discrete index associated with the target, as opposed to generating a continuous estimator. We first create two nested partitions of the unit interval consisting of equal-length sub-intervals, where one partition is coarser than the other. The objective of the learner is to recover the index associated with the sub-interval containing in the finer partition, whereas that of the adversary is to recover the target’s index corresponding to the coarser partition (an easier task!). We consider this discrete formulation because it allows for a simpler analysis using Fano’s inequality, setting the stage for the localized query complexity lower bound in the next section.
Formally, fix such that is an integer. Define to be the sub-interval
In particular, the set is a partition of into sub-intervals of length each. We will refer to as the -uniform partition. Define
That is, denotes the indices of the interval containing in the -uniform partition. A visualization of the index is given in Figure 2.
We now formulate an analogous, and slightly more general, definition of accuracy and privacy for the discrete problem. We will use the super-script to distinguish them from their counterparts in the original, continuous formulation. Just like the learner strategy in Definition 1.1, a discrete learner strategy, , is allowed to submit queries at any point along , and has access to the random seed, . The only difference is that, instead of generating a continuous estimator, a discrete learner strategy produces an estimator for the index of the sub-interval containing the target in an -uniform partition, . [-accuracy - Discrete Version] Fix and . A discrete learner strategy, , is -accurate if it produces an estimator, , such that
Importantly, in contrast to its continuous counterpart in Definition 1.1 where the estimator must satisfy the error criterion with probability one, the discrete learner strategy is allowed to make mistakes up to a probability of .
The role of adversary is similarly defined in the discrete formulation: upon observing all queries, the adversary generates an estimator, , for the index associated with the sub-interval containing in the (coarser) -uniform partition, . The notion of -privacy for a discrete learner strategy is defined in terms of the adversary’s (in)ability to estimate the index .
[-privacy - Discrete Version] Fix and . A discrete learner strategy, , is -private if under any adversary estimator , we have that
We will denote by as the family of discrete learner strategies that employ at most queries. We are now ready to define the query complexity of the discrete formulation, as follows:
A main result of this subsection is the following lower bound on , which we will convert into one for the original problem in Section 7.4.
[Query Complexity Lower Bound for Discrete Learner Strategies] Fix , and in and , such that . We have that
where is the Shannon entropy of a Bernoulli random variable with mean : for , and .
7.2 Localized Query Complexity Lower Bound
We prove Proposition 2 in the next two subsections. The first step, accomplished in the present subsection, is to use Fano’s inequality to establish a query complexity lower bound localized to a sub-interval: conditional on the target belonging to a sub-interval in the -partition, any discrete-learner strategy must devote a non-trivial number of queries in that sub-interval if it wishes to be reasonably accurate. Since all learner strategies considered in the next two subsections will be for the discrete problem, we will refer to them simply as learner strategies when there is no ambiguity.
Fix , and a learner strategy . Because the strategy will submit at most queries, without loss of generality, we may assume that if the learner wishes to terminate the process after the first queries, then she will simply set to for all , and the responses for those queries will be trivially equal to almost surely. Denote by the set of queries that lie within the sub-interval :
and by its cardinality. Denote by the set of responses for those queries in . Define to be the learner’s (conditional) probability of error:
We have the following lemma. The proof is based on Fano’s inequality.
[Localized Query Complexity] Fix , and , , and an -accurate discrete learner strategy. We have that
for all , and , and
Denote by the event:
Because the random seed, , is uniformly distributed over , Eq. (22) follows directly from the learner strategy’s being -accurate:
We now show Eq. (21). Fix , and . We begin by making the simple observation that, conditional on , the subset of queries together with their responses is sufficient for generating the learner’s estimator, , because under this conditioning, any query that lies outside the sub-interval provides no additional information about the location of than what is already known. Furthermore, since the random seed is fixed to , the th query, , is a deterministic function of the first responses. We conclude that the set of responses alone is sufficient for generating .
For an event, , we will denote by the conditional entropy under the probability law :
where and are the alphabets for random variables and , respectively. Similarly, define
be the vector representation of:
and for all . The conditional entropy of given satisfies:
where the inequality follows from the fact that, conditional on there being responses in , we know that only the first bits of can be random, and hence the entropy of cannot exceed , which is the entropy of a length- vector where each entry is an independent Bernoulli random variable with mean . We now invoke the following lemma by Robert Fano (cf. Section 2.1 of cover2012elements). [Fano’s Inequality] Let and be two random variables, where takes values in a finite set, . Let
be a discrete random variable taking values in, such that , where is a deterministic function, and a random variable independent from both and . Let . We have that
where is the conditional entropy of given .
We apply Fano’s inequality with the substitutions: , , and . Eq. (29) yields
where we have used the fact that, conditional on the event , the index can take at most
values. By the chain rule of conditional entropy, we have that
This proves Lemma 7.2. .
7.3 Proportional-Sampling Estimator
We now use the local complexity result in Lemma 7.2 to complete the proof of Proposition 2. The lemma states that if the target were to lie in a given sub-interval in the -uniform partition, then an accurate learner strategy would have to place at least queries within the said sub-interval on average. A naive approach to using the lemma would let the adversary’s estimator select a sub-interval among those in that contain approximately queries, with the hope that the learner would have to induce such sub-intervals to ensure privacy. However, for this estimator to be a credible threat, the local complexity lower bound needs to hold with almost certainty, instead of merely on average. Unfortunately, it may not be possible to strengthen the lower bound to holding with certainty in general. For instance, it is easy to see that the learner could “guess” the target’s location with only two queries, at the expense of possibly increasing the average number of queries overall. This suggests that forcing the number of queries to concentrate at, or above, under an arbitrary learner strategy is likely very difficult, if not impossible.
Nevertheless, we are still able to take advantage of the average local complexity in Lemma 7.2, but with a more robust adversary estimator. We will focus on a family of proportional-sampling estimators for the adversary. Despite their simplicity, these estimators prove to be sufficiently powerful against any learner strategy that uses a “small” number of queries. A proportional-sampling estimator, , is generated according to the distribution:
That is, an index is sampled with a probability proportional to the number of queries that fall within the corresponding sub-interval in the -uniform partition.
We next bound the probability of correct estimation when the adversary employs a proportional-sampling estimator: for all , we have that
where step follows from the fact that , and hence , and step from Lemma 7.2. Recall that the learner strategy is -private, and the random seed has a probability density of in and zero everywhere else. Since Eq. (34) holds for all and , we can integrate and obtain the adversary’s overall probability of correct estimation:
Step is a result of Jensen’s inequality and the Bernoulli entropy function ’s being concave.
7.4 From Discrete to Continuous Strategies
We now connect Proposition 2 to the original continuous estimation problem. The next proposition is the main result of this subsection. The core of the proof is a reduction that constructs a -accurate and -private discrete learner strategy from an -accurate and -private continuous learner strategy. Fix and in and , such that and . Fix .444To avoid the use of rounding in our notation, we will assume that is an integer multiple of . We have that
Fix and a continuous learner strategy, , such that is both -accurate and -private. Let the estimator of . It suffices to show that there exists a function , such that by using the same queries as , and setting we obtain a -accurate and -private discrete learner strategy. Specifically, let be the discrete learner strategy that submits the same queries as , and produces the estimator
That is, reports the index of the sub-interval in the -uniform partition that contains the continuous estimator, .
We first show that the induced discrete learner strategy is -private. The intuition is that if the target is sufficiently far away from the edges of the sub-interval in the -uniform partition to which it belongs, then both and will belong to the same sub-interval, and we will have . To make this precise, denote by the set of end points of the sub-intervals in the -uniform partition: Let be the set of all points in whose distance to is greater than :
It is not difficult to show that the Lebesgue measure of satisfies where is the length of the intersection of with each of the sub-intervals in a -partition. Since is -accurate, we know that must be no more than away from , and hence whenever , which implies
This shows that is -accurate.
We next show that is also -private. For the sake of contradiction, suppose otherwise. Then, there exists an estimator for the adversary, , such that
We now use to construct a “good” adversary estimator for the continuous version: let be the mid point of the sub-interval , where is the th sub-interval in the -uniform partition. If , then contains , and since the length of is , we must have , and from Eq. (42), this implies
We therefore conclude that if an estimator satisfying Eq. (42) did exist, then the original continuous learner strategy, , could not have been -private, which leads to a contradiction. We have thus shown that is -accurate and -private. Because uses the same sequence of queries as , we conclude that . This proves Proposition 7.4.
7.4.1 Completing the Proof of the Lower Bound
where the last step follows from Proposition 2 by substituting with and with . Letting , the above inequality can be rearranged to become
where step follows from the assumption that , and the fact that for all . Consider the choice:
To verify still belongs to the range , note that the assumption that ensures