3SUM with Preprocessing: Algorithms, Lower Bounds and Cryptographic Applications

07/19/2019 ∙ by Alexander Golovnev, et al. ∙ MIT NYU college Harvard University 0

Given a set of integers {a_1, ..., a_N}, the 3SUM problem requires finding a_i, a_j, a_k ∈ A such that a_i + a_j = a_k. A preprocessing version of 3SUM, called 3SUM-Indexing, considers an initial offline phase where a computationally unbounded algorithm receives a_1,...,a_N and produces a data structure with S words of w bits each, followed by an online phase where one is given the target b and needs to find a pair (i, j) such that a_i + a_j = b by probing only T memory cells of the data structure. In this paper, we study the 3SUM-Indexing problem and show the following. [New algorithms:] Goldstein et al. conjectured that there is no data structure for 3SUM-Indexing with S=N^2-ε and T=N^1-ε for any constant ε>0. Our first contribution is to disprove this conjecture by showing a suite of algorithms with S^3 · T = Õ(N^6); for example, this achieves S=Õ(N^1.9) and T=Õ(N^0.3). [New lower bounds:] Demaine and Vadhan in 2001 showed that every 1-query algorithm for 3SUM-Indexing requires space Ω̃(N^2). Our second result generalizes their bound to show that for every space-S algorithm that makes T non-adaptive queries, S = Ω̃(N^1+1/T). Any asymptotic improvement to our result will result in a major breakthrough in static data structure lower bounds. [New cryptographic applications:] A natural question in cryptography is whether we can use a "backdoored" random oracle to build secure cryptography. We provide a novel formulation of this problem, modeling a random oracle whose truth table can be arbitrarily preprocessed by an unbounded adversary into an exponentially large lookup table to which the online adversary has oracle access. We construct one-way functions in this model assuming the hardness of a natural average-case variant of 3SUM-Indexing.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

One of the many equivalent formulations of the 3SUM problem is the following: given a set of integers, output such that . There is an easy time deterministic algorithm for 3SUM. Conversely, the popular 3SUM conjecture states that there are no sub-quadratic algorithms for this problem [GO95, Eri99].

Conjecture 1 (The “Modern 3sum conjecture”).

There is no algorithm for 3SUM running in time for some constant .

In this paper, we focus on a preprocessing variant of 3SUM known as 3SUM-Indexing, which was first defined by Demaine and Vadhan [DV01] in an unpublished note and then by Goldstein, Kopelowitz, Lewenstein and Porat [GKLP17]. In 3SUM-Indexing, there is an offline phase where a computationally unbounded algorithm receives and produces a data structure with words of bits each; and an online phase which is given the target and needs to find a pair such that by probing only memory cells of the data structure (i.e., taking “query time” ). Note that the online phase does not receive the set directly.

There are two simple algorithms that solve 3SUM-Indexing. The first stores a sorted version of as the data structure (so ) and in the online phase, solves 3SUM-Indexing in time using the standard two-finger algorithm for 3SUM. The second stores all pairwise sums of , sorted, as the data structure (so ) and in the online phase, looks up the target in time.111The notation suppresses poly-logarithmic factors in . There were no other algorithms known prior to this work. This led [DV01, GKLP17] to formulate the following three conjectures.

Conjecture 2 ([Gklp17]).

If there exists an algorithm which solves 3SUM-Indexing with preprocessing space and probes then .

Conjecture 3 ([Dv01]).

If there exists an algorithm which solves 3SUM-Indexing with preprocessing space and probes, then .

Conjecture 4 ([Gklp17]).

If there exists an algorithm which solves 3SUM-Indexing with probes for some then .

Note that these conjectures are in ascending order of strength:

Conjecture 4 Conjecture 3 Conjecture 2.

In terms of lower bounds, Demaine and Vadhan [DV01] showed that any -probe data structure for 3SUM-Indexing requires space . They leave the case of open. Goldstein et al. [GKLP17] established connections between Conjectures 2 and 4 and hardness of Set Disjointness, Set Intersection, Histogram Indexing and Forbidden Pattern Document Retrieval.

1.1 Our contributions

Our contributions are three-fold. First, we show better algorithms for 3SUM-Indexing, refuting Conjecture 4. Second, we improve the lower bound of [DV01] to arbitrary ; our lower bound gives a non-trivial space bound for non-adaptive queries. As we argue later, any asymptotic improvement to our lower bound will result in a major breakthrough in static data structure lower bounds. Third and finally, we show how to use the conjectured hardness of 3SUM-Indexing to enable a new cryptographic application, namely that of removing backdoors from unkeyed cryptographic functions. Next, we give a brief overview of these results in turn.

1.1.1 Upper bound for 3SUM-Indexing

Theorem 1.

For every , there is an adaptive data structure for 3SUM-Indexing with space and query time .

In particular, Theorem 1 implies that by taking , we get a data structure which solves 3SUM-Indexing in space and probes, and, thus, refutes Conjecture 4.

In a nutshell, the upper bound starts by considering the function . This function has a domain of size but a potentially much larger range. In a preprocessing step, we convert this into a function with a range of size as well, such that inverting lets us invert . Once we have such a function, we use a result of Fiat and Naor [FN00] who give a general space-time tradeoff for inverting functions. This result gives non-trivial data structures for function inversion as long as function evaluation can be done efficiently. Due to our definitions of the functions and , we can efficiently compute them at every input, which leads to efficient inversion of , and, therefore, efficient solution to 3SUM-Indexing. For more details, see Section 4. We note that prior to this work the result of Fiat and Naor [FN00] was recently used by Corrigan-Gibbs and Kogan [CGK18] for other algorithmic and complexity applications. In a concurrent work, Kopelowitz and Porat [KP19] obtain a similar upper bound for 3SUM-Indexing.

1.1.2 Lower bound for 3SUM-Indexing

We show that any algorithm for 3SUM-Indexing that uses a small number of probes requires large space, as expressed formally in Theorem 2.

Theorem 2.

For every non-adaptive algorithm that uses space and query time and solves 3SUM-Indexing, it holds that .

The lower bound gives us meaningful (super-linear) space bounds for nearly logarithmic . Showing super-linear space bounds for static data structures for probes is a major open question with significant implications [Sie04, Pǎt11, PTW10, Lar12, DGW19]. Essentially the only known technique for proving super-linear space lower bounds for is cell-sampling. While the standard cell-sampling argument does not apply to the 3SUM-Indexing problem since this problem does not have the expansion property under any distribution, we are able to recover the corresponding lower bound for 3SUM-Indexing via a related method in the non-adaptive case.

In a nutshell, our lower bound proceeds by an incompressibility argument (introduced by Gennaro and Trevisan in [GT00], and later developed in [DTT10, DGK17]). That is, we show that any data structure with “surprising” space-time behavior can be used to compress a random set beyond its entropy. We refer the reader to Section 5 for more details on the proof.

1.1.3 Cryptographic application: “backdoored” random oracles

The power and prevalence of backdoors in cryptographic algorithms is a renewed concern given recent revelations [CMG16, CNE14, Gre13]. We consider a new model of a random oracle “backdoored” by a powerful adversary during a preprocessing phase.222A random oracle is simply a uniformly random function to which all parties have oracle access. We will let throughout.

Recent results [Unr07, DGK17, CDGS18] studied the auxiliary-input random-oracle model in which an attacker can compute arbitrary bits of auxiliary information about the function table of the random oracle in a pre-processing phase, and make additional queries to the random oracle in an online phase (where, for example, the adversary receives that she wants to invert). However, it is easy to see that such a result cannot be true when since the preprocessed oracle can simply be the function table of the inverse function . This allows an adversary to invert by making a single query to the bits of preprocessed backdoor. In reality, we do wish to capture the possibility that the backdoor is the inverse function, which puts us in a conundrum.

A way out is to try to “immunize” the random oracle by designing a function which we wish to show is one-way even against an adversary that has oracle access to the backdoor for . Let . To prevent the inverse table attack above, we require at the minimum that the size of the preprocessed table is less than .

This leads us to a natural static data structure problem with pre-processing. Imagine an adversary that can preprocess the table of into bits. Our goal is to construct a function (which necessarily acts on more than bits) which is one-way against an adversary who can make queries to the bits of preprocessed random oracle. Assuming the hardness of 3SUM with preprocessing for and space , we show precisely such an immunization strategy. We refer the reader to Section 6 for more details on the construction.

We note the recent work of [BFM18] which circumvents the inverse table barrier in a different way, by assuming the existence of at least two independent (backdoored) random oracles. This allows them to use techniques from two-source extraction and communication complexity to come up with an immunization strategy.

2 Related work

2.1 Preprocessing attacks in cryptography

The power and limitation of preprocessing attacks have been studied in several contexts of cryptography, including time-space tradeoffs, non-uniform security and immunizing backdoors.

Time-space tradeoffs

Hellman studied time-space tradeoffs for inverting random functions in his seminal paper [Hel80]. He showed that a random -to- bit function can be inverted with online time and precomputed information. A series of followup works [FN00, BBS06, DTT10] studied time-space tradeoffs for inverting arbitrary one-way permutations, one-way functions and pseudorandom generators. In particular, Fiat and Naor [FN00] showed a tradeoff for inverting any -to- bit function on every point.

Random oracles and non-uniform security

Motivated by assessing the non-uniform security of hash functions, recent works [Unr07, DGK17, CDGS18] studied the auxiliary-input random-oracle model in which an attacker can compute arbitrary bits of leakage before attacking the system and make additional queries to the random oracle. Although our model is similar in that it allows preprocessed leakage of a random oracle, we differ significantly in two ways: the size of the leakage is larger, and the attacker only has oracle access to the leakage.

Specifically, their results and technical tools only apply to the setting where the leakage is smaller than the value table of the random oracle, whereas our model deals with leakage that is allowed to be larger. Furthermore, the random oracle model with auxiliary input allows the online adversary to access and depend on the leakage in an arbitrary way while our model only allows bounded number of oracle queries to the leakage, which is a more realistic model for online adversaries with bounded time and which cannot read the entire leakage at query time.

Immunizing backdoors and kleptography

Motivated by immunizing backdoors, a series of recent works [DGG15, BFM18, RTYZ18, FJM18] studied backdoored primitives including pseudorandom generators and hash functions. In this setting, the attacker might be given some space-bounded backdoor about a primitive, which could allow him to break the system more easily.

In particular, backdoored hash functions and random oracles are studied in  [BFM18, FJM18]. Both of them observe that immunizing against a backdoor for a single unkeyed hash function might be hard. For this reason, [BFM18] considers the problem of combining two random oracles (with two independent backdoors). Instead, we look at the case of a single random oracle but add a restriction on the size of the advice. [FJM18] considers the setting of keyed functions such as (weak) pseudorandom functions, which are easier to immunize than unkeyed functions of the type we consider in this work.

The study of backdoored primitives is also related to — and sometimes falls within the field of — kleptography, originally introduced by Young and Yung [YY97, YY96b, YY96a]. A kleptographic attack “uses cryptography against cryptography” [YY97], by changing the behavior of a cryptographic system in a fashion undetectable to an honest user with black-box access to the cryptosystem, such that the use of the modified system leaks some secret information (e.g., plaintexts or key material) to the attacker who performed the modification. An example of such an attack might be to modify the key generation algorithm of an encryption scheme such that an adversary in possession of a “back door” can derive the private key from the public key, yet an honest user finds the generated key pairs to be indistinguishable from correctly produced ones.

2.2 3sum

The field of fine-grained complexity leverages algorithmic hardness assumptions to prove quantitative lower bounds for wide classes of problems. The standard assumptions in this field (Vassilevska Williams [VW15, VW18] gives excellent surveys of this topic) are hardness of CNF-SAT [IP01, IPZ01], 3SUM [GO95, Eri99]

, Orthogonal Vectors (OV) 

[Wil05], All Pairs Shortest Paths (APSP) [WW10], and Online Matrix-Vector Multiplication (OMV) [HKNS15].

Implications of the 3sum Conjecture

The 3SUM conjecture (Conjecture 1) has been helpful for understanding the precise hardness of many geometric problems [GO95, dBdGO97, BVKT98, ACH98, Eri99, BHP01, AHI01, SEO03, AEK05, EHPM06, CEHP07, AHP08, AAD12]. Starting with the works of Vassilevska and Williams [VW09], and Pǎtraşcu [Pǎt10], the 3SUM conjecture has also been used for conditional lower bounds for many combinatorial [AVW14, GKLP16, KPP16] and string search [CHC09, BCC13, AVWW14, ACLL14, AKL16, KPP16] problems.

Algorithms for 3sum

The two standard algorithms for 3SUM on integers from a range of size run in time and  [CLRS09] (Exercise 30.1-7). The most efficient algorithm for this problem, due to Baran et al. [BDP08], runs in time . Kane et al. [KLM18]

prove almost tight bounds on the linear decision tree complexity of

3SUM. Wang and Lincoln et al. [Wan14, LVWWW16] show that 3SUM algorithms can be implemented in space . Also, Carmosino et al. [CGI16] give a co-non-deterministic algorithm for 3SUM running in time , which shows that proving SETH-hardness of 3SUM is out of reach of the current techniques.

The “Real 3SUM” problem, where the input numbers are reals rather than integers, has also attracted a lot of attention. Starting with the breakthrough work of Grønlund and Pettie [GP14], the algorithms (in the Real-RAM model) for this version of 3SUM [Fre17, Cha18] achieved running time , which essentially matches the running time of the algorithms for Integer 3SUM.

Data-structure versions of the conjecture

While the standard conjectures about the hardness of CNF-SAT, 3SUM, OV and APSP concern algorithms, the OMV conjecture claims a data structure lower bound for the Matrix-Vector Multiplication problem. While algorithmic conjectures help to understand time complexity of the problems, it is also natural to consider data structure analogues of the fine-grained conjectures in order to understand space complexity of the corresponding problems. Recently Goldstein et al. [GKLP17, GLP17] proposed data structure variants of many classical hardness assumptions (including 3SUM and OV). Other data structure variants of the 3SUM problem have also been studied in [DV01, BW09, CL15, CCI19]. In particular, Chan and Lewenstein [CL15] use techniques from additive combinatorics to give efficient data structures for solving 3SUM on subsets of the preprocessed sets.

3 Preliminaries

3.1 Notation

When an uppercase letter represents an integer, we use the convention that the associated lowercase letter represents its base- logarithm: , etc. denotes the set that we identify with . denotes the concatenation of bit strings and . PPT stands for probabilistic polynomial time.

3.2 3SUM-Indexing

In this paper, we focus on the variant of 3SUM known as 3SUM-Indexing, formally defined in [GKLP17], which can be thought of as a preprocessing variant of 3SUM. Our definition is a generalization of [GKLP17], where we allow the input to be elements of an arbitrary abelian333This is for convenience and because our applications only involve abelian groups; our results and techniques easily generalize to the non-abelian case. group.

Definition 3.

The problem , parametrized by an abelian group and an integer , is defined to be solved by a two-part algorithm as follows.

  • Preprocessing phase. receives as input a tuple of elements from and outputs a data structure of size444The model of computation in this paper is the word RAM model where we assume that the word length is . Furthermore we assume that words are large enough to contain description of elements of , i.e., for some . The size of a data structure is the number of words (or cells) it contains. at most . is computationally unbounded.

  • Query phase. Denote by the set of pairwise sums of elements from : . Given an arbitrary query , makes at most oracle queries to and must output with such that .555Without loss of generality, we can assume that contains a copy of and in this case could return a pair at the cost of two additional queries.

We say that is an algorithm for . Furthermore, we say that is non-adaptive if the queries made by are non-adaptive (i.e., the indices of the queried cells are only a function of ).

A few remarks about Definition 3 are in order.

Remark 1.

An alternative definition would have the query be an arbitrary element of (instead of being restricted to ) and return the special symbol when . Again, an algorithm for the problem as defined in Definition 3—with undefined behavior for —can be turned into an algorithm for this seemingly more general problem at the cost of two extra queries: given output on query , return if and return otherwise.

Remark 2.

The requirement that for ’s output is without loss of generality for integers, but prevents the occurrence of degenerate cases in some groups. For example, if is such that all elements are of order (e.g., ) then finding such that has the trivial solution for any .

Remark 3.

In order to preprocess the elements of some group , we assume an efficient way to enumerate its elements. More specificially, we assume a time- and space-efficient algorithm for evaluating an injective function for a constant . For simplicity, we also assume that the word length is at least so that we can store for every in a memory cell. For example, for the standard 3SUM-Indexing problem over the integers from to , one can consider the group for , and the trivial function for . For ease of exposition, we abuse notation and write instead of for an element of the group . For example, for an integer will always mean .

The standard 3SUM-Indexing problem (formally introduced in [GKLP17]) corresponds to the case where . In fact, it is usually assumed that the integers are upper-bounded by some polynomial in , which is easily shown to be equivalent to the case where for some , and is sometimes referred to as modular 3SUM when there is no preprocessing.

Another important special case is when for some . In this case, can be thought of as the group of binary strings of length where the group operation is the bitwise XOR (exclusive or). This problem is usually referred to as 3XOR when there is no preprocessing, and we refer to its preprocessing variant as 3XOR-Indexing. In [JV16], the authors provide some evidence that the hardnesses of 3XOR and 3SUM are related and conjecture that Conjecture 1 generalizes to 3XOR. We similarly conjecture that in the presence of preprocessing, Conjecture 3 generalizes to 3XOR-Indexing.

Following Definition 3, the results and techniques in this paper hold for arbitrary abelian groups and thus provide a unified treatment of the 3SUM-Indexing and 3XOR-Indexing problems. It is an interesting open question for future research to better understand the influence of the group on the hardness of the problem.

Open Question 1.

For which groups is 3SUM-Indexing significantly easier to solve, and for which groups does Conjecture 3 not hold?

3.2.1 Average-case hardness

This paper moreover introduces a new average-case variant of 3SUM-Indexing (Definition 4 below) that, to the authors’ knowledge, has not been stated in prior literature.666We remark that for the classical version of 3SUM, the uniform random distribution of the inputs is believed to be the hardest (see, e.g., [KPP16]). Definition 4 states an error parameter , as for the cryptographic applications it is useful to consider solvers for average-case 3SUM-Indexing

that only output correct answers with probability

.

Definition 4.

The average-case problem, parametrized by an abelian group and integer , is defined to be solved by a two-part algorithm as follows.

  • Preprocessing phase. Let be a tuple of elements from drawn uniformly at random and with replacement. outputs a data structure of size at most . has unbounded computational power.

  • Query phase. Given a query drawn uniformly at random in , and given up to oracle queries to , outputs with such that .

We say that is an solver for 3SUM-Indexing if it answers the query correctly with probability over the randomness of , , and the random query . When , we leave it implicit and write simply .

Remark 4.

Note that in the query phase of Definition 4, the query is chosen uniformly at random in and not in . As observed in Remark 1, this is without loss of generality for . When , the meaningful way to measure the success probability of is as in Definition 4 since could have negligible density in and could succeed with overwhelming probability by always outputting .

4 Upper bound

We will use the following data structure first suggested by Hellman [Hel80] and then rigorously studied by Fiat and Naor [FN00].

Theorem 5 ([Fn00]).

For any function , and for any choice of values and such that , there is a deterministic data structure with space which allows inverting at every point making queries to the memory cells and evaluations of 777While the result in Theorem 1.1 in [FN00] is stated for a randomized preprocessing procedure, we remark that a less efficient deterministic procedure which brute forces the probability space can be used instead.

The idea of our upper bound is the following. Since we are only interested in the pairwise sums of the input elements , we can hash down their sums to a set of size . Now we define the function for , and note that its domain and range are both of size . We apply the generic inversion algorithm of Fiat and Naor to with , and obtain a data structure for 3SUM-Indexing.

First, in Lemma 6 we give an efficient data structure for the “modular” version of where for an integer and inputs , each query asks to find such that .888Recall from Remark 3 that this notation actually means . Then, in Theorem 7 we reduce the general case of to the modular one.

Lemma 6.

For every and every integer , there is an adaptive data structure which uses space and query time and solves modular : for input and a query , it outputs such that , if such and exist.

Proof.

Let the input elements be . The data structure will store all (this takes only memory cells) along with the information needed to efficiently invert the function defined below. For , let . Note that:

  1. is easy to compute. Indeed, given the input, one can compute by looking at only two input elements.

  2. The domain of is of size , and the range of is of size .

  3. Inverting at a point allows one to check whether there exists and such that , which essentially solves the modular problem.

Now we use the data structure from Theorem 5 with to invert . This gives us a data structure with space and query time for every , which finishes the proof. ∎

It remains to show that the input of 3SUM-Indexing can always be hashed to a set of integers for some . While many standard hashing functions will work here, we remark that it is important for our applciation that the hash function of choice has a time- and space-efficient implementation (for example, the data structure in [FN00] requires non-trivial implementations of hash functions). Below we present a simple hashing procedure which suffices for 3SUM-Indexing, but a more general reduction can be found in Lemma 17 in [CGK18].

Theorem 7.

For every , there is an adaptive data structure for with space and query time .

In particular, by taking , we get a data structure which solves 3SUM-Indexing in space and query time , and, thus, refutes Conjecture 4.

Proof.

Let the inputs be . Let be the set of pairwise sums of the inputs:

Let be an interval of integers. By the prime number theorem, for large enough , contains at least primes. Let us pick random primes from . For two distinct numbers , we say that they have a collision modulo if divides .

Let be a positive query of , that is, . First, we show that with high probability (over the choices of random primes) there exists an such that for every , and do not collide modulo . Indeed, for every , we have that has at most prime factors from . Since , at most primes from divide for any . Therefore, a random prime from gives a collision for and some with probability at most . Now we have that for every , the probability that there exists an such that does not collide with any modulo , is at least . Therefore, with high probability, a random set of primes works for all : for every there exists an such that does not collide with any modulo . Since such a set of primes exists, the preprocessing stage of the data structure can find it deterministically.

Now we construct modular data structures (one for each ), and separately solve the problem for each of the primes. This results in a data structure as guaranteed by Lemma 6 with a overhead in space and time. The data structure also stores the inputs . Once it sees a solution modulo , it checks whether it corresponds to a solution to the original problem. Now correctness follows from two observations. Since the data structure checks whether a solution modulo gives a solution to the original problem, the data structure never reports false positives. Second, the above observation that for every there is a prime such that does not collide with other , a solution modulo will correspond to a solution of the original problem (thus, no false negatives can be reported either). ∎

Remark 5.

A few extensions of Theorem 7 are in order.

  1. The result of Fiat and Naor [FN00] also gives an efficient randomized data structure. Namely, there is a randomized data structure with preprocessing running time , which allows inverting at every point with probability at least over the randomness of the preprocessing stage. Thus, the preprocessing phase of the randomized version of Theorem 5 runs in quasilinear time (since sampling random primes from a given interval can also be done in randomized time ). This, in particular, implies that the preprocessing time of the presented data structure for 3SUM-Indexing is optimal under the 3-SUM Conjecture (Conjecture 1). Indeed, if the preprocessing time was improved to , then one could solve 3-SUM by querying the input numbers in (randomized or expected) time .

  2. We remark that for polynomially small (for constant ), the trade-off between and can be further improved for the -approximate solution of 3SUM-Indexing, using approximate function inversion by De et al. [DTT10].

We showed how to refute the strong 3SUM-Indexing conjecture of [GKLP17] using techniques from space-time tradeoffs for function inversion [Hel80, FN00], specifically the general function inversion algorithm of Fiat and Naor [FN00]. A natural open question is whether a more specific function inversion algorithm could be designed:

Open Question 2.

Can the space-time trade-off achived in Theorem 7 be improved by exploiting the specific structure of the 3SUM-Indexing problem?

5 Lower bound

We now present our lower bound: we prove a space-time trade-off of for any non-adaptive algorithm. While it is weaker than Conjecture 3, any improvement on this result would break a long-standing barrier in static data structure lower bounds: no bounds better than are known, even for non-adaptive cell-probe and linear models [Sie04, Pǎt11, PTW10, Lar12, DGW19].

Our proof relies on a compressibility argument similar to [GT00, DTT10] also known as cell-sampling in the data structure literature [PTW10]. Roughly speaking, we show that given an algorithm , we can recover a subset of the input by storing a randomly sampled subset of the preprocessed data structure and simulating on all possible queries: the simulation succeeds whenever the queries made by fall inside . Thus, by storing along with the remaining part of the input, we obtain an encoding of the entire input. This implies that the length of the encoding must be at least the entropy of a randomly chosen input.

Theorem 8.

Let be an integer and be an abelian group with , then any non-adaptive algorithm for satisfies: .

Proof.

Consider an algorithm for . We want to use to design encoding and decoding procedures for inputs of . For this, we will first sample a subset of the data structure cells which allows us to answer many queries. Using this set, we will argue that we can recover a constant fraction of the input, which will lead to a succinct encoding of the input.

Sampling a subset of cells.

For a query , denotes the set of probes made by on input (with , since makes at most probes to the data structure). Given a subset of cells, we denote by the set of queries in which can be answered by by only making probes within : . Observe that for a uniformly random set of size :

where the last inequality uses that for . Hence, there exists a subset of size , such that:

and we will henceforth consider such a set . The size of will be set later so that .

Using to recover the input.

Consider some input for . We say that is good if is output by given some query in . Since queries in can be answered by only storing the subset of cells of the data structure indexed by , our decoding procedure will retrieve from these cells all the good elements from . Note that is good if:

(1)

Indeed, observe that:

  1. The first part of the conjunction guarantees that there exists which can be decomposed as for .

  2. The second part of the conjunction guarantees that the decomposition is unique, i.e. that there is no other pair of elements of which adds up to .

By correctness of , outputs a decomposition of its input as a sum of two elements in if one exists. For as in (1), the decomposition is unique and hence .

We denote by the set of good indices, and compute its expected size when is chosen at random according to the distribution in Definition 4, i.e. for each , is chosen independently and uniformly in .

(2)

Note that:

where the second equality is because for , needs to be distinct from the elements for . Furthermore:

where the first inequality uses the union bound and the last inequality uses that . Using the previous two derivations in (2), we get:

(3)

where the last inequality uses that and that for .

Encoding and decoding.

It follows from (3) and a simple averaging argument that with probability at least over the random choice of , is of size at least . We will henceforth focus on providing encoding and decoding procedures for such inputs . Specifically, consider the following pair of encoding/decoding algorithms for :

  • : given input .

    1. use to compute the set of good indices.

    2. store and .

  • : for each , simulate on input :

    1. If , use (which was stored in ) to simulate and get . By definition of , when ranges over the queries such that , this step recovers .

    2. Then recover directly from .

Note that the bit length of the encoding is:

where is the word length and where the second inequality holds because we restrict ourselves to inputs such that . By a standard incompressibility argument (see for example Fact 8.1 in [DTT10]), since our encoding and decoding succeeds with probability at least over the random choice of , we need to be able to encode at least distinct values, hence:

(4)

Finally, as discussed before, we set such that . For this, by the computation performed at the beginning of this proof, it is sufficient to have:

Hence, we set and since (otherwise the result is trivial), (4) implies:

Remark 6.

reduces to 3SUM-Indexing over the integers, so our lower bound extends to , too. Specifically, the reduction works as follows: we choose as the set of representatives of . Given some input for , we treat it as a list of integers and build a data structure using our algorithm for . Now, given a query , we again treat it as an integer and query the data structure at and . The correctness of the reduction follows from the observation that if and only if or .

As we already mentioned, no lower bound better than is known even for non-adaptive cell-probe and linear models, so Theorem 8 matches the best known lower bounds for static data structures. An ambitious goal for future research would naturally be to prove Conjecture 3. A first step in this direction would be to extend Theorem 8 to adaptive strategies that potentially err with some probability.

Open Question 3.

Must any (possibly adaptive) algorithm for require ?

6 Application to cryptography

6.1 Background on random oracles and preprocessing

A line of work initiated by [IR89] studies the hardness of a random oracle as a one way function. In [IR89] it was shown that a random oracle is an exponentially hard one-way function against uniform adversaries. The case of non-uniform adversaries was later studied in [Imp96, Zim98]. Specifically we have the following result.

Proposition 9 ([Zim98]).

With probability at least over the choice of a random oracle , for all oracle circuits of size at most :

In Proposition 9, the choice of the circuit occurs after the random draw of the oracle: in other words, the description of the circuit can be seen as a non-uniform advice which depends on the random oracle. Proposition 10

is a slight generalization where the adversary is a uniform Turing machine independent of the random oracle, with oracle access to an advice of length at most

depending on the random oracle. While the two formulations are equivalent in the regime , one advantage of this reformulation is that can be larger than the running time of the adversary.

Proposition 10 (Implicit in [Dtt10]).

Let be a uniform oracle Turing machine whose number of oracle queries is . For all and , with probability at least over the choice of a random oracle :

In Proposition 10, the advice can be thought of as the result of a preprocessing phase involving the random oracle. Also, no assumption is made on the computational power of the preprocessing adversary but it is simply assumed that the length of the advice is bounded.

Remark 7.

Propositions 9 and 10 assume a deterministic adversary. For the regime of (which is the focus of this work), this assumption is without loss of generality since a standard averaging argument shows that for a randomized adversary, there exist a choice of “good” randomness for which the adversary achieves at least its expected success probability. This choice of randomness can be hard-coded in the non-uniform advice, yielding a deterministic adversary.

Note, however, that Proposition 10 provides no guarantee when . In fact, in this case, defining to be any inverse mapping of allows an adversary to invert with probability one by making a single query to . So, itself can no longer be used as a one-way function when — but one can still hope to use to define a new function that is one-way against an adversary with advice of size . This idea motivates the following definition.

Definition 11.

Let be a random oracle. A one-way function in the random oracle model with preprocessing is an efficiently computable oracle function such that for any two-part adversary satisfying and where is PPT, the following probability is negligible in :999A negligible function is one that is in for all constants .

The adversary model in Definition 11 is almost identical to the -BRO model of [BFM18], differing only in having a restriction on the output size of . As was noted in [BFM18], without this restriction (and in fact, as soon as by the same argument as above), no function can achieve the property given in Definition 11. [BFM18] bypasses this impossibility by considering the restricted case of two independent oracles with two independent preprocessed advices (of unrestricted sizes). Our work bypasses it in a different and incomparable way, by considering the case of a single random oracle with bounded advice.

6.2 Candidate construction

Our main candidate construction of a OWF (Construction 13) relies on the hardness of average-case 3SUM-Indexing. First, we define what hardness means, then give the constructions and proofs.

Definition 12.

Average-case 3SUM-Indexing is -hard if the success probability101010Over the randomness of , , and the average-case 3SUM-Indexing query. (Recall: is 3SUM-Indexing’s input.) of any algorithm in answering average-case queries is at most .

Construction 13.

Let be an integer and be an abelian group with and let be a random oracle.

Our candidate OWF construction has two components:

  • the function defined as follows, where denotes ’s operation:

  • the input distribution where is uniformly random in and is uniformly random in .

Remark 8 (Approximate sampling).

For convenience, our candidate OWF is defined with respect to an input distribution which is not uniform over the domain of . While this may seem restrictive, as long as the input distribution is efficiently samplable, a standard construction shows how to transform a one way function of this type into a one-way function which operates on uniformly random bit strings (see [Gol01, Section 2.4.2]).

In our case, since is not a power of two (for ), the input distribution in Construction 13 cannot be sampled exactly in time polynomial in . However, using rejection sampling, it is easy to construct a sampler taking as input random bits and whose output distribution is close to in statistical distance. It is easy to propagate this (negligibly) small sampling error without affecting the conclusion of Theorem 14.

This is similar to what happens when considering one way functions based on the hardness of factoring which require sampling integers in a range whose length is not necessarily a power of two.

Remark 9.

Similarly, the random oracle used in the construction is not a random oracle in the traditional sense since its domain and co-domain are not bit strings. If and are powers of two, then can be implemented exactly by a standard random oracle . If not, using a random oracle , and rejection sampling, it is possible to implement an oracle which is close to in statistical distance. We can similarly propagate this sampling error without affecting the conclusion of Theorem 14.

Theorem 14.

Consider a sequence of abelian groups such that for some and all , and a function . Assume that for all polynomial there exists a negligible function such that average-case 3SUM-Indexing is hard for all (recall that ). Then the function defined in Construction 13 is a one-way function in the random oracle model with preprocessing.

The function in Construction 13 is designed precisely so that inverting is equivalent to solving 3SUM-Indexing for the input . However, the one-way function inversion game and average-case 3SUM-Indexing

are not exactly identical. Indeed, in the one-way function inversion game, the input to the adversary is the random variable

which is distributed as for a uniformly random set of indices . In contrast, in average-case 3SUM-Indexing, the input to the adversary is uniform over . These two input distributions are not identical if there are collisions: pairs of sets , such that . The following two lemmas show that whenever for some , there are sufficiently few collisions that the two input distributions are negligibly close in statistical distance, which is sufficient to prove Theorem 14.

Lemma 15.

Let be an integer and let be an abelian group with for some . Let be a tuple of elements drawn with replacement from . Define the following two random variables:

  • where is a uniformly random set of size 2.

  • : uniformly random in .

Then the statistical distance .

Proof.

First, by conditioning on the realization of :

(5)

where denotes the distribution of conditioned on the event for .

We now focus on a single term, corresponding to the realization and define the set of pairwise sums , and for , is the number of pairs whose sum is . Then we have:

Observe that whenever . We now assume that (we will later only use the following derivation under this assumption). Splitting the sum on :

where we used the trivial upper bound