Exploring Differential Obliviousness

05/03/2019 ∙ by Amos Beimel, et al. ∙ 0

In a recent paper Chan et al. [SODA '19] proposed a relaxation of the notion of (full) memory obliviousness, which was introduced by Goldreich and Ostrovsky [J. ACM '96] and extensively researched by cryptographers. The new notion, differential obliviousness, requires that any two neighboring inputs exhibit similar memory access patterns, where the similarity requirement is that of differential privacy. Chan et al. demonstrated that differential obliviousness allows achieving improved efficiency for several algorithmic tasks, including sorting, merging of sorted lists, and range query data structures. In this work, we continue the exploration and mapping of differential obliviousness, focusing on algorithms that do not necessarily examine all their input. This choice is motivated by the fact that the existence of logarithmic overhead ORAM protocols implies that differential obliviousness can yield at most a logarithmic improvement in efficiency for computations that need to examine all their input. In particular, we explore property testing, where we show that differential obliviousness yields an almost linear improvement in overhead in the dense graph model, and at most quadratic improvement in the bounded degree model. We also explore tasks where a non-oblivious algorithm would need to explore different portions of the input, where the latter would depend on the input itself, and where we show that such a behavior can be maintained under differential obliviousness, but not under full obliviousness. Our examples suggest that there would be benefits in further exploring which class of computational tasks are amenable to differential obliviousness.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

A program’s memory access pattern can leak significant information about the private information used by the program even if the memory content is encrypted. Such leakage can turn into a data protection problem in various settings. In particular, where data is outsourced to be stored on an external server, it has been shown that access pattern leakage can be exploited in practical attacks and lead to the compromise of the underlying data [19, 4, 27, 20, 22]. Such leakages can also be exploited when a program is executed in a secure enclave environment but needs to access memory that is external to the enclave.

Memory access pattern leakage can be avoided by employing a strategy that makes the sequence of memory accesses (computationally or statistically) independent of the content being processed. Beginning with the seminal work of Goldreich and Ostrovsky, it is well known how to transform any program running on a random access memory (RAM) machine to one with an oblivious memory access pattern while retaining efficiency by using an Oblivious RAM protocol (ORAM) [10, 28, 13]. Current state-of-the-art ORAM protocols achieve logarithmic overhead [2], matching a recent lowerbound by Larsen and Nielsen [23], and protocols with overhead exist when large blocks are retrieved [6, 26]. To further reduce the overhead, oblivious memory access pattern protocols have been devised for specific tasks, including graph algorithms [3, 17], geometric algorithms [8] and sorting [16, 24]. The latter is motivated by sorting being a fundamental and well researched computational task as well as its ubiquity in data processing.

1.1 Differential Obliviousness

Full obliviousness is rather a strong requirement: any two possible inputs (of the same size) should exhibit identical or indistinguishable sequences of memory accesses. Achieving full obliviousness via a generic use of ORAM protocols requires a setup phase with running time (at least) linear in the memory size and then a logarithmic overhead per each memory access.

A recent work by Chan, Chung, Maggs, and Shi [5] put forward a relaxation of the obliviousness requirement where indistinguishability is replaced with differential privacy. Intuitively, this means that any two possible neighboring inputs should exhibit memory access patters that are similar enough to satisfy differential privacy, but may still be too dissimilar to be “cryptographically” indistinguishable. It is not a priori clear whether differential obliviousness can be achieved without resorting to full obliviousness. However, the recent work Chan et al. showed that differential obliviousness does allow achieving improved efficiency for several algorithmic tasks, including sorting (over very small domains), merging of sorted lists, and range query data structures.

Furthermore, even the use of ORAM protocols may be insufficient for preventing leakage in cases where the number of memory probes is input dependent. In fact, Kellaris et al. [20]

show that such leakage can result in a complete reconstruction in the case of retrieving elements specified by range queries, as the number of records returned depends on the contents of the data structure. Full obliviousness would require that the sequence of memory accesses would be padded to a maximal one to avoid such leakage, a solution that would have a dire effect on the efficiency of many algorithms. Differential obliviousness may in some cases allow achieving meaningful privacy while maintaining efficiency. Examples of such protocols include the combination of ORAM with differentially private sanitization by Kellaris et al. 

[21] and the more recent work of Chan et al. [5] on range query data structures, which avoids using ORAM.

1.2 This Work: Exploring Differential Obliviousness

Noting that the existence of logarithmic overhead ORAM protocols implies that differential obliviousness can yield at most a logarithmic improvement in efficiency for computations that need to examine all their input, we explore tasks where this is not the case. In particular, we focus on property testing and on tasks where the number of memory accesses can depend on the input.

Property testing.

As evidence that differential obliviousness can provide a significant improvement over full obliviousness, we show in Section 3 that property testers in the dense graph model, where the input is in the adjacency matrix representation [12], can be made differentially oblivious. This result captures a large set of testable graph properties [12, 1] including, e.g., graph bipartitness and having a large clique. Testers in this class probe a uniformly random subgraph and hence are fully oblivious without any modification, as their access pattern does not depend on the input graph. However, this is not the case if the tester reveals its output to the adversary, as this allows learning information about the specific probed subgraph. A fully oblivious tester would need to access a linear-sized subgraph, whereas we show that a differentially oblivious tester only needs to apply the original tester for times.111We omit dependencies on privacy and accuracy parameters from this introductory description.

We also consider property testing in the bounded degree model, where the input is in the incidence lists model [14]. In this model we provide negative results, demonstrating that adaptive testers cannot, generally, be made differentially oblivious without a significant loss in efficiency. In particular, in Section 4 we consider differentially oblivious property testers for connectivity in graphs of degree at most two. For non-oblivious testers, it is known that constant number of probes suffice when the tester is adaptive [14].222In an adaptive tester at least one choice of a node to probe should depend on information gathered from incidence lists of previously probed nodes. It is also known that any non-adaptive tester for this task requires probing nodes [29]. We show that this lower bound extends to differentially oblivious testers, i.e., any differentially oblivious tester for connectivity in graphs of maximal degree requires probes. While this still improves over full obliviousness, the gap between full and differential obliviousness is in this case diminished.

Locating an Object Satisfying a Property.

Here, our goal is to check whether a given data set of objects includes an object that satisfies a specified property. Without obliviousness requirements, a natural approach is to probe elements in a random order until an element satisfying the property is found or all elements were probed. If a fraction of the elements satisfy the property, then the expected number of probes is . This algorithm is in fact instance optimal when the data set is randomly permuted.333Our treatment of instance optimality is rather informal. The concept was originally presented in [9].

A fully oblivious algorithm would require probes on any dataset even when . In contrast, we demonstrate in Section 5 that with differential obliviousness instance optimality can, to a large extent, be preserved. Our differentially oblivious algorithm always returns a correct answer and makes at most

probes with probability at least

.

Prefix Sum.

Our last example considers a sorted dataset (possibly, the result of an earlier phase in the computation). Our goal is to compute the sum of all records in the (sorted) dataset that are less than or equal to a given value .444See Section 6 for the definition of privacy.

Without obliviousness requirements, one can find the greatest record less than or equal to value , say, using binary search, and then compute the prefix sum by a quick scan through all records appearing before such record. This algorithm is in fact nearly instance optimal, as it can be shown that any algorithm which returns the correct exact answer with non-negligible probability must probe all entries greater than . However, fully oblivious algorithms would have to probe the entire dataset.

In Section 6 we give our nearly instance optimal differentially oblivious prefix sum algorithm. As the probes of a binary search would leak information about the memory content, we introduce a differentially oblivious “simulation” of the binary search. Our differentially oblivious binary search runs in time .

We also address the scenario where there are multiple prefix sum queries to the same database. If the number of queries is bounded by some integer , then each differentially oblivious binary search will run in time (as we need to run the search algorithm with a smaller privacy parameter ). Using ORAM, one can answer such queries with prepossessing time and time per query. Combining our algorithm and ORAM, we can amortize the pre-processing time over queries, that is, without any pre-processing, the running of time of answering the -th query is for the first queries and for any further query.

1.3 Background Work

The work by Chan, Chung, Maggs, and Shi [5] mentioned above is most relevant for this article. Goldreich, Goldwasser, and Ron [12] initiated the research on graph property testing. Goldreich’s book on property testing [11] gives sufficient background for our discussion. Dwork, McSherry, Nissim, and Smith [7] presented differential Privacy. For more details on ORAM and a list of relevant papers, the reader can consult [2].

2 Definitions

2.1 Model of Computation

We consider the standard Random Access Memory (RAM) model of computation that consists of a CPU and a memory. The CPU executes a program and is allowed to perform two types of memory operations: read a value from a specified physical address, and write a value to a specified physical address. We assume that the CPU has a private cache of where it can store a values (and/or a polylogarithmic number of bits). As an example, in the setting of a client storing its data on the cloud, the client plays the role of the CPU and the cloud server plays the role of the memory.

We assume that a program’s sequence of read and write operations may be visible to an adversary. We will call this sequence the program’s access pattern. We will further assume that the memory content is encrypted so that no other information is leaked about the content read from and stored in memory location. The program’s access pattern may depend on the program’s input, and may hence leak information about it.

2.2 Oblivious Algorithms

There are various works focused on oblivious algorithms [8, 15, 25] and Oblivious RAM (ORAM) constructions [13]. These works adopt “full obliviousness” as a privacy notion. Suppose that is an algorithm that takes in two inputs, a security parameter and an input dataset denoted . We denote by , the ordered sequence of memory accesses the algorithm makes on the input and .

Definition 2.1 (Fully Oblivious Algorithms).

Let be a function in a security parameter . We say that algorithm is -statistically oblivious, iff for all inputs and of equal length, and for all , it holds that where denotes that the two distributions have at most statistical distance. We say that is perfectly oblivious when .

2.3 Differentially Oblivious Algorithms

Suppose that is an (stateful) algorithm that takes in three inputs, a security parameter , an input dataset denoted by and a value . We slightly change the definition of differential oblivious algorithms given in [5]:

Experiment Experiment Return Oracle Output

Figure 1: An experiment for defining differential obliviousness.
Definition 2.2 (Neighbor-respecting).

We say that two input datasets and are neighboring iff they are of the same length and differ in exactly one entry. We say that is neighbor-respecting adversary iff for every and every , outputs neighboring datasets , with probability 1.

Definition 2.3.

Let be privacy parameters. Let be a (possibly stateful) algorithm described as above. To an adversary we associate the experiment in Figure 1, for every . We say that is -adaptively differentially oblivious if for any (unbounded) stateful neighbor-respecting adversary we have

In Figure 1, denotes the ordered sequence of memory accesses the algorithm makes on the inputs and .

Remark 2.4.

The notion of adaptivity here is different from the one defined in [5]. We require that the dataset remain the same through the experiment whereas in [5] the adaptive adversary can add or remove entries to the dataset.

As with differential privacy, we usually think about as a small constant and require that where  [7]. Observe that if is -statistically oblivious then it is also -differentially oblivious.

The following simple lemma will be useful to analyze our algorithms. The proof of the lemma appears in Appendix A.

Lemma 2.5.

Let be an -differentially oblivious algorithm and be an algorithm such that for every dataset the statistical distance between and is at most (that is, for every ). Then, is an -differentially oblivious algorithm.

3 Differentially Oblivious Property Testing of Dense Graphs Properties

In this section, we present a differentially oblivious property tester for dense graphs properties in the adjacency matrix representation model. A property tester is an algorithm that decides whether a given object has a predetermined property or is far from any object having this property by examining a small random sample of its input. The correctness requirement of property testers ignores objects that neither have the property nor are far from having the property. However, the privacy requirement is “worst case” and should hold for any two neighboring graphs. For the definition of privacy we say that two graphs of size are neighbors if one can get by changing the neighbors of exactly one node of .

Property testing of graph properties in the adjacency matrix representation was introduced in [12]. A graph is represented by the predicate such that if and only if and are adjacent in . The notion of distance between graphs is defined to be the number of different matrix entries over . This model is most suitable for dense graphs where the number of edges . We define a property of graphs to be a subset of the graphs. We write to show that graph has the property . For example, we can define the bipartiteness property, where is the set of all bipartite graphs.555 Recall that an undirected graph is bipartite (or 2-colorable) if if its vertices can be 2-partitioned into two parts, and , such that each part is an independent set (i.e., ). We say that an -vertex is -far from if for every -vertex graph it holds that the symmetric difference between and is greater than . We define the property testing in this model as follows:

Definition 3.1 ([12]).

A -tester for a graph property is a probabilistic algorithm that, on inputs , and an adjacency matrix of an -vertex graph :

  1. Outputs 1 with probability at least , if .

  2. Outputs 0 with probability at least , if is -far from .

We say a tester has one-sided error, if it accepts every graph in with probability 1. We say a tester is non-adaptive if it determines all its queries to adjacency matrix only based on , and its randomness; otherwise, we say it is adaptive.

Example 3.2 ([12]).

Consider the following -tester for bipartiteness: Choose a random subset of size

with uniform distribution and output 1 iff the graph induced by

is bipartite. Clearly, if is bipartite, then the tester will always return 1. Goldreich et al. [12] proved that if is -far from a bipartite graph, then the probability that the algorithm returns 1 is at most .

Recall that in the graph property testing, the tester chooses a random subset of the graph with uniform distribution to test the property . Given the access pattern of the tester , an adversary will learn nothing since it is uniformly random. Thus, the access pattern by itself does not reveal any information about the input graph. However, we assume that the adversary also learns the tester’s output and can hence learn some information about the input graph based on the output of the tester. To protect this information, we run tester for constant number of times and output iff the number of times outputs exceed a (randomly chosen) threshold.

Let be a -tester for a graph property where . We write for the number of nodes that samples. Note that is constant in the graph size and a function of and . For simplicity, we only consider property testers with one-sided error. In Figure 2, we describe a -tester that outputs with probability at least if and outputs 0 with probability at least , if is -far from , where is defined below.

Algorithm Input: graph Let and For to do If then Let be the subset of vertices chosen by tester Update graph to be the induced sub-graph on \ If then output , else output 0

Figure 2: A Differentially Oblivious Property Tester for Dense Graphs.
Theorem 3.3.

Let and . Algorithm is an -differentially oblivious algorithm that outputs 1 with probability 1 if , and output 0 with probability at least if is -far from .

The proof of Theorem 3.3 appears in Section A.2.

4 Lower Bounds on Testing Connectivity in the Incidence Lists Model

We now consider connectivity differentially oblivious testing in the incidence lists model [14]. In this model a graph has a bounded degree and is represented as a function , where is the -th neighbor of (if no such neighbor exists, then ). In this model, the relative distance between graphs is normalized by – the maximal number of edges in the graph. Formally, for two graphs with vertices,

A -tester in the incidence lists model is defined as in Definition 3.1, where a property is a set of graphs whose maximal degree is and the distance to a property is defined with respect to .

Goldreich and Ron [14] showed how to test if a graph is connected in the incidence list model in time . Raskhodnikova and Smith [29] showed that a tester for connectivity (or any non-trivial property) with run-time has to be adaptive, that is, the nodes that the algorithm probes should depend on the neighbors of nodes the algorithm has already probed (e.g., the algorithm probes some node , discovers that is a neighbor and , and probes ). We strengthen their results by showing that any tester for connectivity in graphs of maximal degree and run-time cannot be a differentially oblivious algorithm. We stress that adaptivity alone is not a reason for inefficiency with differential obliviousness. In fact, there exist differentially oblivious algorithms that are adaptive (e.g., our algorithm in Section 6).

Theorem 4.1.

Let such that . Every -differentially private -tester for connectivity in graphs with maximal degree 2 runs in time .

Proof.

Let Tester be a -tester for connectivity in graphs of degree at most . We somewhat relax the definition of probes and assume that once the tester probes a node, it sees all edges adjacent to this node. We prove that if Tester probes less than nodes (for some constant ), then it is not -oblivious.

Assume that . Let be a cycle of length and consist of disjoint triangles. Clearly, is connected and is -far from a connected graph. For a permutation , define , where , and let be a random graph isomorphic to , that is, for a permutation chosen with uniform distribution.666 When we permute a graph, we also permute its incident list representation, i.e., if , then with probability half will be the first neighbor of and with probability half it will be the second. On the random graph Tester has to say “yes” with probability at least 3/4 and on the random graph Tester has to say “no” with probability at least 3/4.

Observation 4.2.

If Tester does not probe two distinct nodes whose distance is at most two, then Tester sees a collection paths of length two and cannot know if the graph is or .

Claim 4.3.

Given the random graph , the tester has to probe two distinct nodes whose distance is at most 2 with probability at least .

Proof.

Consider Tester’s answer when it sees a collection of paths of length . Assume first that the tester returns “No” with probability at least half in this case and let be the probability that Tester probes two distinct nodes whose distance is at most two on the random graph . The probability that Tester returns “Yes” on is at most . Thus, , i.e., .

If the tester returns “Yes” with probability at least half, then, by symmetric arguments, with probability at least Tester has to probe two nodes whose distance is at most two on the random graph . For a permutation , if the distance between two nodes in is at most 2, then the distance between these two nodes in is at most 2. Thus, by Observation 4.2,

Denote the nodes of by and define a distribution on pairs of graphs , obtained by the following process:

  • Choose a permutation with uniform distribution and let .

  • Denote and for .

  • Choose with uniform distribution two indices such that (where the addition is done modulo ).

  • Let , where

The graphs are described in Figure 3. Note that is also a a random graph isomorphic to , thus, given one cannot know which pair of non-adjacent nodes was used to create .

Figure 3: The graphs and .

Observe that and differ on nodes. Since Tester is -differntially oblivious, for every algorithm ,

Consider the following algorithm :

If and at least one of is probed by prior to seeing any other pair of nodes of distance at most in or , then return otherwise return .

Claim 4.4.

Let . Suppose that Tester probes at most nodes. Pick at random with uniform distribution two nodes in with distance at least in . The probability that probes both and prior to seeing any two nodes of distance at most in is (where the probability is over the random choice of and the randomness of Tester).

Proof.

The node is a uniformly distributed node in and is any node of distance at least from , thus there are options for . Given a collection of paths of length at most in all options are equally likely.

Let be the nodes probed in some execution of Tester. Fix some pair of indices . The probability that is at most . Thus, the probability that and are probed is at most

Claim 4.5.

Assume that Tester probes at most queries. The probability that is at least .

Proof.

By Claim 4.3, the probability that Tester probes at least one pair of nodes with distance at most is at least . Given that this event occurs, the probability that the random (chosen with uniform distribution) has the smallest index in the first such pair in (i.e., the first pair is either or ) is at least .

Clearly, given these events no two nodes with distance at most in were probed prior to probing the pair containing . Furthermore, there are pairs of nodes that are of distance at most in and are of distance greater than in . By Claim 4.4, the probability that such pair is probed prior to Tester probing a pair of distance at most in is . ∎

Claim 4.6.

Suppose that Tester probes at most nodes. The probability that is .

Proof.

The node is a uniformly distributed node in . Furthermore, the nodes is a uniformly distributed node of distance at least from in , thus by Claim 4.4, the probability that Tester probes both and prior to seeing any pair of distance at least in is . This probability can only decrease if we require that Tester probes both and prior to seeing any pair of distance at least in and in .

By the same arguments, the probability that Tester probes both and prior to seeing any pair of distance at least in and in is . ∎

To conclude the proof of Theorem 4.1, we note that by (4) and Claims 4.6 and 4.5

Since , it follows that . ∎

5 Differentially Oblivious Algorithm for Locating an Object

Given a dataset of objects our goal is to locate an object that satisfies a property , if one exists. E.g., given a dataset consisting of employee records, find out an employee with income in the range if at least one such an employee exists in the dataset.

Absent privacy requirements, a simple approach is to probe elements of the dataset in a random order until an element satisfying the property is found or all elements were probed. If a fraction of the dataset enries satisfy then the expected number of elements sampled by the non-private algorithm is . However, a perfectly oblivious algorithm would require probes on any dataset, in particular on a dataset where all elements satisfy , where non-privately one probe would suffice. To see why, let if and otherwise and let include exactly one 1-entry in a uniformly random location. Observe that in expectation it requires memory probes to locate the 1-entry in . Perfect obliviousness would hence imply an probes on any input.

We give a nearly instance optimal differentially oblivious algorithm that always returns a correct answer. Except for probability the algorithm halts after steps.

Detailed Algorithm.

Given the access pattern of the non-private algorithm, an adversary can learn that the last probed element satisfies . To hide this information, we change the stopping condition to having probed at least a (randomly chosen) threshold of elements satisfying . If after probes the number of elements satisfying is below the threshold the entire dataset is scanned. Our algorithm is described in Figure 4. On a given array , algorithm outputs 1 iff there exists an element in satisfying the property .

Algorithm Input: dataset Let , , and For to do Choose with uniform distribution If then If is an integral power of then If then output Scan the entire dataset and if there is an element satisfying then output , else output 0

Figure 4: A Differentially Oblivious Locate Algorithm.

We remark that Algorithm

uses a mechanism similar to the the sparse vector mechanism of 

[18]. However, in our case instead of using a single noisy threshold accross all steps, Algorithm generates in each step a noisy threshold . The value of ensures that with high probability . The proof of Theorem 5.1 is given in Section A.3.

Theorem 5.1.

Algorithm is an -differentially oblivious algorithm that outputs 1 iff there exists an element in the array that satisfies property . For , with probability it halts in time at most , where .

6 Differentially Oblivious Prefix Sum

Suppose that there is a dataset consisting of sorted sensitive user records, and one would like to compute the sum of all records in the (sorted) dataset that are less than or equal to a value in a way that respects individual user’s privacy. We call this task differentially oblivious prefix sum. For the definition of privacy we say that two datasets of size are neighbors if they agree on elements (although, as sorted arrays they can disagree on many indices). For example, and are neighbors and should have similar access pattern.

Without privacy one can find the greatest record less than or equal to value , and then compute the prefix sum by a quick scan through all records appearing before such record. Any perfectly secure algorithm must read the entire dataset (since it is possible that all elements are smaller than ). Here, we give a differentially oblivious prefix sum algorithm that for many instances is much faster than any perfectly oblivious algorithm.

Intuition.

Absent privacy requirements, using binary search, one can find the greatest element less than or equal to , and then compute the prefix sum by a quick scan through all records that appear before such record. However, the binary search access pattern allows the adversary to gain sensitive information about the input. Our main idea is to approximately simulate the binary search and obfuscate the memory accesses to obtain differential obliviousness. In order to do that, we first divide the input array into chunks (where is polynomial in , and ). Then, we find the chunk that contains the greatest element less than or equal to by comparing the first element (hence, the smallest element) of each chunk to . Let be the index of such chunk. Next, we compute a noisy interval that contains using the Laplacian distribution. We iteractively repeat this process on the noisy interval, where in each step we eliminate more than a quarter of the elements of the interval. We continue until the size of the array is less than or equal to . Next, we scan all elements in the remaining array and find the index of the greatest element smaller than or equal to . Let be the index of such element; we compute the prefix sum by scanning the array until index .

The Search Algorithm.

We present a search algorithm in Figure 5; on input and this algorithm finds the largest index such that . To compute the prefix sum, we compute and scan the first elements of the dataset, summing only the first . We show in Theorem 6.2 that our search algorithm is -differentially oblivious.

Algorithm Search Input: a dataset and a value Let , , , , and While do Let , where for every Scan the entire dataset and find the maximal index such that ; if there is no such element then and Scan the entire dataset between and and return the the maximal index such that ; if there is no such element then

Figure 5: A Differentially Oblivious Search Algorithm.
Remark 6.1.

We prove that algorithm Search is an -differentially private algorithm that returns a correct index with probability at least . We could change it to an -differentially private algorithm that never errs. This is done by truncating the noise to .

Theorem 6.2.

Let and . Algorithm Search is an -differentially oblivious algorithm that, for any input array with size and , returns a correct index with probability at least . The running time of Algorithm Search is .

Theorem 6.2 is proved in Section A.4.

6.1 Dealing with Multiple Queries

We extend our prefix sum algorithm to answer multiple queries. We can answer a bounded number of queries by running the differentially oblivious prefix sum algorithm multiple times. That is, when we want an -oblivious algorithm correctly answering queries with probability at least , we execute algorithm Search times with privacy parameter and error probability (each time also computing the appropriate prefix sum). Thus, the running time of the algorithm is (excluding the scan time for computing the sum).

On the other hand, we can use an ORAM to answer unbounded number of queries. That is, in a pre-processing stage we store the records and for each record we store the sum of all records up to this record. Thereafter, answering each query will require one binary search. Using the ORAM of [2], the pre-processing will take time and answering each query will take time . Thus, the ORAM algorithm is more efficient when .

We use ORAM along with our differentially oblivious prefix sum algorithm to answer unbounded number of queries while preserving privacy, combining the advantages of both of the previous algorithms.

Algorithm MultiSearch Input: a dataset , For every query : if the greatest element in the ORAM is greater than or all records are in the ORAM (that is ) then answer the query using the ORAM Otherwise, execute algorithm Search with privacy parameter and accuracy parameter for the database starting at record and let the largest index in this database such that insert the first elements of this database to the ORAM; for each element also insert the sum of all elements in the array up to this element ,

Figure 6: A Differentially Oblivious Search Algorithm for Multiple Queries.
Theorem 6.3.

Algorithm MultiSearch, described in Figure 6, is an -oblivious algorithm, which executes Algorithm Search at most times, where the run time of the -th execution is , scans the original database at most once, and in addition each query run time is at most .

Proof.

First note that we only pay for privacy in the executions of algorithm Search (reading and writing to the ORAM is perfectly private). In the -th execution of algorithm Search, we insert at least elements to the ORAM, thus after executions we inserted at least elements to the ORAM.

By simple composition, algorithm MultiSearch is -differentialy private, where

where the last inequality is implied by the sum of the harmonic series. ∎

References

  • [1] Noga Alon, Eldar Fischer, Michael Krivelevich, and Mario Szegedy. Efficient testing of large graphs. Combinatorica, 20(4):451–476, 2000.
  • [2] Gilad Asharov, Ilan Komargodski, Wei-Kai Lin, Kartik Nayak, and Elaine Shi. Optorama: Optimal oblivious RAM. IACR Cryptology ePrint Archive, 2018:892, 2018.
  • [3] Marina Blanton, Aaron Steele, and Mehrdad Aliasgari. Data-oblivious graph algorithms for secure computation and outsourcing. In Kefei Chen, Qi Xie, Weidong Qiu, Ninghui Li, and Wen-Guey Tzeng, editors, 8th ACM Symposium on Information, Computer and Communications Security, ASIA CCS ’13, pages 207–218. ACM, 2013.
  • [4] David Cash, Paul Grubbs, Jason Perry, and Thomas Ristenpart. Leakage-abuse attacks against searchable encryption. In Indrajit Ray, Ninghui Li, and Christopher Kruegel, editors, Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, 2015, pages 668–679. ACM, 2015.
  • [5] T.-H. Hubert Chan, Kai-Min Chung, Bruce M. Maggs, and Elaine Shi. Foundations of differentially oblivious algorithms. In Timothy M. Chan, editor, Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2019, pages 2448–2467. SIAM, 2019.
  • [6] Srinivas Devadas, Marten van Dijk, Christopher W. Fletcher, Ling Ren, Elaine Shi, and Daniel Wichs. Onion ORAM: A constant bandwidth blowup oblivious RAM. In Eyal Kushilevitz and Tal Malkin, editors, Theory of Cryptography - 13th International Conference, TCC 2016-A, volume 9563 of Lecture Notes in Computer Science, pages 145–174. Springer, 2016.
  • [7] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam D. Smith. Calibrating noise to sensitivity in private data analysis. In Shai Halevi and Tal Rabin, editors, Theory of Cryptography, Third Theory of Cryptography Conference, TCC 2006, volume 3876 of Lecture Notes in Computer Science, pages 265–284. Springer, 2006.
  • [8] David Eppstein, Michael T. Goodrich, and Roberto Tamassia. Privacy-preserving data-oblivious geometric algorithms for geographic data. In Divyakant Agrawal, Pusheng Zhang, Amr El Abbadi, and Mohamed F. Mokbel, editors, 18th ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems, ACM-GIS 2010, pages 13–22. ACM, 2010.
  • [9] Ronald Fagin, Amnon Lotem, and Moni Naor. Optimal aggregation algorithms for middleware. In Peter Buneman, editor, Proceedings of the Twentieth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 102–113. ACM, 2001.
  • [10] Oded Goldreich. Towards a theory of software protection and simulation by oblivious rams. In Alfred V. Aho, editor,

    Proceedings of the 19th Annual ACM Symposium on Theory of Computing

    , pages 182–194. ACM, 1987.
  • [11] Oded Goldreich. Introduction to Property Testing. Cambridge University Press, 2017.
  • [12] Oded Goldreich, Shafi Goldwasser, and Dana Ron. Property testing and its connection to learning and approximation. J. ACM, 45(4):653–750, 1998.
  • [13] Oded Goldreich and Rafail Ostrovsky. Software protection and simulation on oblivious rams. J. ACM, 43(3):431–473, 1996.
  • [14] Oded Goldreich and Dana Ron. Property testing in bounded degree graphs. Algorithmica, 32(2):302–343, 2002.
  • [15] M. T. Goodrich, O. Ohrimenko, and R. Tamassia. Data-oblivious graph drawing model and algorithms. CoRR, 2012.
  • [16] Michael T. Goodrich. Zig-zag sort: a simple deterministic data-oblivious sorting algorithm running in time. In David B. Shmoys, editor, Symposium on Theory of Computing, STOC 2014, pages 684–693. ACM, 2014.
  • [17] Michael T. Goodrich and Joseph A. Simons. Data-oblivious graph algorithms in outsourced external memory. In Zhao Zhang, Lidong Wu, Wen Xu, and Ding-Zhu Du, editors, Combinatorial Optimization and Applications - 8th International Conference, COCOA 2014, volume 8881 of Lecture Notes in Computer Science, pages 241–257. Springer, 2014.
  • [18] Moritz Hardt and Guy N. Rothblum. A multiplicative weights mechanism for privacy-preserving data analysis. In 51th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2010, pages 61–70. IEEE Computer Society, 2010.
  • [19] Mohammad Saiful Islam, Mehmet Kuzu, and Murat Kantarcioglu. Inference attack against encrypted range queries on outsourced databases. In Elisa Bertino, Ravi S. Sandhu, and Jaehong Park, editors, Fourth ACM Conference on Data and Application Security and Privacy, CODASPY’14, pages 235–246. ACM, 2014.
  • [20] Georgios Kellaris, George Kollios, Kobbi Nissim, and Adam O’Neill. Generic attacks on secure outsourced databases. In Edgar R. Weippl, Stefan Katzenbeisser, Christopher Kruegel, Andrew C. Myers, and Shai Halevi, editors, Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pages 1329–1340. ACM, 2016.
  • [21] Georgios Kellaris, George Kollios, Kobbi Nissim, and Adam O’Neill. Accessing data while preserving privacy. CoRR, abs/1706.01552, 2017.
  • [22] Marie-Sarah Lacharité, Brice Minaud, and Kenneth G. Paterson. Improved reconstruction attacks on encrypted data using range query leakage. In 2018 IEEE Symposium on Security and Privacy, SP 2018, pages 297–314. IEEE Computer Society, 2018.
  • [23] Kasper Green Larsen and Jesper Buus Nielsen. Yes, there is an oblivious RAM lower bound! In Hovav Shacham and Alexandra Boldyreva, editors, Advances in Cryptology - CRYPTO 2018 - 38th Annual International Cryptology Conference, volume 10992 of Lecture Notes in Computer Science, pages 523–542. Springer, 2018.
  • [24] Wei-Kai Lin, Elaine Shi, and Tiancheng Xie. Can we overcome the barrier for oblivious sorting? In Timothy M. Chan, editor, Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2019, pages 2419–2438. SIAM, 2019.
  • [25] Chang Liu, Xiao Shaun Wang, Kartik Nayak, Yan Huang, and Elaine Shi. Oblivm: A programming framework for secure computation. In 2015 IEEE Symposium on Security and Privacy, SP 2015, pages 359–376. IEEE Computer Society, 2015.
  • [26] Tarik Moataz, Travis Mayberry, and Erik-Oliver Blass. Constant communication ORAM with small blocksize. In Indrajit Ray, Ninghui Li, and Christopher Kruegel, editors, Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pages 862–873. ACM, 2015.
  • [27] Muhammad Naveed, Seny Kamara, and Charles V. Wright. Inference attacks on property-preserving encrypted databases. In Indrajit Ray, Ninghui Li, and Christopher Kruegel, editors, Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security 2015, pages 644–655. ACM, 2015.
  • [28] Rafail Ostrovsky. Efficient computation on oblivious rams. In Harriet Ortiz, editor, Proceedings of the 22nd Annual ACM Symposium on Theory of Computing, pages 514–523. ACM, 1990.
  • [29] Sofya Raskhodnikova and Adam D. Smith. A note on adaptivity in testing properties of bounded degree graphs. Electronic Colloquium on Computational Complexity (ECCC), 13(089), 2006.

Appendix A Missing Proofs

a.1 Proof of Lemma 2.5

Proof.

Let and be two neighboring datasets and be a sets of outputs. Then,

a.2 Proof of the Correctness and Privacy of Algorithm

Theorem 3.3 is implied by the following lemmas.

Lemma A.1.

Algorithm is -differentially oblivious.

Proof.

We first analyze a variant of , denoted by , in which Step 4 is replaced by “If then output ” (that is, the algorithm does not check if before deciding in the positive).

Let and be two neighboring graphs such that they differ on node . Fix the random choices of subsets in Step 2b and observe that after the execution of for loop, the count can differ by at most between the executions on and . Let be the smallest integer greater than . Since algorithm uses the Laplace mechanism for every . Thus,

Similarly, . Hence, is -differentially oblivious.

We next prove that is -differentially oblivious using Lemma 2.5, that is we prove that for every graph , the statistical distance between and is at most . Let be the event that and observe that the probability occurs is at most .777 for every . Thus, . We have that . Thus, by Lemma 2.5, algorithm is -differentially oblivious. ∎

Observe that Algorithm never errs when as in that case after the for loop is executed and hence in Step 4 outputs . The next lemma analyses the error probability when is -far from .

Lemma A.2.

Algorithm is -tester for the graph property .

Proof.

Observe that on Step 2c of the algorithm, we are eliminating at most edges. Thus, we are eliminating at most edges in total. Then, when is -far from , it is also -far from after the removal of the observed nodes in each step of the for loop. We next prove that Algorithm fails with probability at most . Observe that if Algorithm fails on then or . We define to be output of in the -th step of the for loop. Let . Observe that all are independent and . Using the Chernoff Bounds888 for any where is the expectation of ., we obtain that . We also know . Therefore, Algorithm fails with probability . ∎

a.3 Proof of the Correctness and Privacy of Algorithm

The proof of Theorem 5.1 follows from the following claim and lemmas.

Claim A.3.

Let . The probability that there exists an element such that algorithm samples the element in Step 2a more than times is less that

Proof.

Fix an index . The probability that the element is sampled more than times is less than The claim follows by the union bound. ∎

Lemma A.4.

Let . Algorithm is -differentially oblivious.

Proof.

We first analyze a variant of , denoted by , in which Step 2(c)ii is replaced by “If then output ” (that is, the algorithm does not check if ) and no element is sampled more than times. We analyze the privacy of similarly to the analysis of the sparse vector mechanism in [18].

Let and be two neighboring datasets that such that and for some . Denote by the values of the thresholds in an execution of , where each threshold is rounded up to the smallest integer greater than . Furthermore, let be the index such that on input outputs 1 when (if no such exists, then ). Observe that in each execution of Step 2(c)ii the count on input is at least the count on input and can exceed it by at most (since is sampled at most times). Thus, on input with thresholds outputs 1 when . Since algorithm uses the Laplace mechanism with ,

for every . Thus,