The Element Extraction Problem and the Cost of Determinism and Limited Adaptivity in Linear Queries

07/13/2021 ∙ by Amit Chakrabarti, et al. ∙ Dartmouth College 0

Two widely-used computational paradigms for sublinear algorithms are using linear measurements to perform computations on a high dimensional input and using structured queries to access a massive input. Typically, algorithms in the former paradigm are non-adaptive whereas those in the latter are highly adaptive. This work studies the fundamental search problem of element-extraction in a query model that combines both: linear measurements with bounded adaptivity. In the element-extraction problem, one is given a nonzero vector 𝐳 = (z_1,…,z_n) ∈{0,1}^n and must report an index i where z_i = 1. The input can be accessed using arbitrary linear functions of it with coefficients in some ring. This problem admits an efficient nonadaptive randomized solution (through the well known technique of ℓ_0-sampling) and an efficient fully adaptive deterministic solution (through binary search). We prove that when confined to only k rounds of adaptivity, a deterministic element-extraction algorithm must spend Ω(k (n^1/k -1)) queries, when working in the ring of integers modulo some fixed q. This matches the corresponding upper bound. For queries using integer arithmetic, we prove a 2-round Ω(√(n)) lower bound, also tight up to polylogarithmic factors. Our proofs reduce to classic problems in combinatorics, and take advantage of established results on the zero-sum problem as well as recent improvements to the sunflower lemma.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Determinism versus randomization in algorithm design is a fundamental concern in computer science and is the topic of a great many works in complexity theory. In “space-constrained” models such as communication complexity and data streaming, basic results show that derandomization can entail an exponential or worse blow-up in cost. For instance, in the two-party communication setting, the very basic -bit equality problem admits a bounded-error randomized protocol with only communication ( if restricted to private coins), whereas its deterministic communication complexity is as large as it gets, namely . In the data streaming setting, the similarly basic distinct-elements problem admits a one-pass bounded-error randomized algorithm that uses space to provide a -approximation [KNW10], whereas a deterministic algorithm would require space, even if multiple passes and large approximation factors are allowed [CK16]. In this work, we explore such a price-of-determinism phenomenon in the query complexity world, for a similarly basic search problem.

The focus of our study is a search problem that we call element-extraction (henceforth, elemx), where the input is a set , promised to be nonempty, and the goal is to extract any element from . Formally, this is a total search problem given by the relation , where

(1)

As is often the case, the natural correspondence between sets in and vectors in will be useful. Indeed, we shall freely switch between these two viewpoints, using the notational convention that uppercase letters denote sets and their corresponding lowercase boldface variants denote characteristic vectors. Thus, we can also formalize elemx as

(2)

The goal of an algorithm solving elemx is to produce an output such that

: with certainty in the deterministic setting, and with probability

(say) in the randomized setting. In other words, the algorithm must produce a witness of the nonemptiness of . To do so, the algorithm may access (equivalently, ) using linear queries, as we shall now explain.

In a Boolean decision tree model, an algorithm may only access the input vector by querying its individual bits. In such a setting, there is not much to say about

elemx: even randomized algorithms are easily seen to require queries. But things get interesting if we allow more powerful queries: specifically, linear ones. Let us define a linear query protocol over domain (a -LQP, for short) to be a query protocol wherein each query is an evaluation of a linear form , where each . The domain should be thought of a “reasonable” subset of a ring containing —e.g., a finite field, or integers with bounded absolute value—and the linear functions will be evaluated in the underlying ring. The cost of an LQP is the number of linear form evaluations used.333Note that this is somewhat lower than the number of bits needed to encode the output of the queries. In this work we particularly care about the amount of adaptivity in an LQP, which quantifies the extent to which each query depends on the outcomes of previous queries.

To set the stage, we recall the problem of -sampling [FIS08, CF14], from the world of sketching and streaming algorithms. The goal of -sampling is to sample a pair from a nonzero input vector (say), so that and is distributed nearly uniformly on the support of . This is a fundamental primitive, used as a low-level subroutine in a wide range of applications in streaming and other “big data” algorithms. There are several solutions to this problem [CF14], most of which provide a linear sketching scheme, wherein one computes for a certain random matrix and then runs a recovery algorithm on the low-dimensional vector to produce the desired sample. Notice that if the input is a vector , such a scheme provides a randomized LQP for (allowing a small probability of error). In particular, using the optimal -sampling sketch of Jowhari, Sağlam, and Tardos [JST11], we obtain a -LQP that makes queries, using coefficients in , and has the pleasing property of being non-adaptive. We can also obtain a -LQP that makes queries;444Throughout this paper, “” denotes the base- logarithm. details in Section 6.

Turning to the deterministic setting—our main focus in this paper—it is easy to show that a non-adaptive -LQP for must make queries, for basic information-theoretic reasons. For completeness, we give the proof in Proposition 1.5. However, this heavy determinism penalty disappears upon moving to general deterministic LQPs, where we can use adaptivity. Indeed, a simple binary search strategy leads to a -LQP that makes queries, using coefficients in . We can refine this observation to trade off the query complexity for amount of adaptivity. This brings us to our central concept.

Define a -round LQP to be one where the queries are made in batches that we call rounds: the collection of linear forms defining the queries in round depend only on the results of queries made in rounds (a formal definition appears in Section 2). Then, a natural generalization of the binary search strategy provides a -round -LQP for elemx, using coefficients in , making at most queries in total. When we are additionally promised that , where addition is performed in the ring , then this algorithm also works as a -LQP; details in Section 6. Notice that

-round LQPs naturally interpolate between linear sketches at one extreme (when

) and linear decision trees at the other (when ).

The most important message of this paper is that the above rounds-versus-queries tradeoff is asymptotically tight for deterministic linear query protocols for elemx, in several natural settings. We state our results informally for now, with formal statements given after the necessary definitions and preliminaries.

1.1 Our Results and Techniques

We shall study -LQPs for the domains , the ring of integers modulo (with ) as well as , but with coefficients of small magnitude (at most , say). Such restrictions on the coefficients are necessary, because allowing arbitrary integer coefficients makes it possible to recover the entire input with the single query .

When , for small , solving elemx without the promise that is hard, regardless of the number of rounds. Intuitively, there is no cheap way to deterministically verify that a subset indeed contains an index where . Defining the “cost” of an LQP to be the number of queries it makes in the worst case (formally defined in Section 2), we obtain the following not-too-hard results.

Proposition 1.1.

Every deterministic -LQP for has cost , which is optimal.

Proposition 1.2.

For , every deterministic -LQP for has cost .

As noted earlier, adding the promise that permits a more efficient -round deterministic algorithm. For each integer , define to be the version of where we are given the stronger promise that under arithmetic in . Equivalently, using set notation, we are promised that . We prove the following results, using similar round-elimination arguments.

Theorem 1.3.

Every deterministic -round -LQP for has cost .

Theorem 1.4.

Every deterministic -round -LQP for has cost .

Although Theorem 1.4 subsumes Theorem 1.3 in the asymptotic sense, we find it useful to present the former result in full, first, to lay the groundwork for our subsequent lower bound proofs. As we shall see, the fact that is a field leads to an especially clean execution of the round elimination strategy. Note also that a weaker form of Theorem 1.3 follows from existing work on formula size-depth tradeoffs (see Section 7); however, the resulting proof, once fully unrolled, is considerably more complex than our direct argument.

At a high level, a lower bound proof based on round elimination works as follows. We consider a hypothetical -round protocol for -dimensional instances of some problem that does not incur much cost in its first round. Based on this low cost, we extract a -round protocol for -dimensional instances of by “lifting” these smaller instances to special -dimensional instances on which the -round protocol essentially “wastes” its first round. If we can carry out this argument while ensuring that the shrinkage from to is not too drastic, then a too-cheap -round protocol will eventually give us a -round protocol for a nontrivial instance dimension, leading to a contradiction.

In the proofs of the above two theorems, this strategy is executed by identifying a large collection of pairwise disjoint sets that are treated identically in the protocol’s first round. Viewing these sets as blocks of indices within , we consider block-structured instances of and proceed to lift general instances of into these block-structured ones. In Theorem 1.3, these blocks arise from elementary linear algebraic considerations. In Theorem 1.4, the fact that inputs are in instead of necessitates a brief excursion into additive combinatorics.

Finally, we consider LQPs over , the ring of all integers, but with bounds on the magnitude of coefficients (which, as we noted earlier, is necessary in order to have nontrivial results). To be precise, we consider domains of the form . While we are unable to prove a full tradeoff lower bound in this case, we do obtain a near-optimal result for rounds.

Proposition 1.5.

Every deterministic -round -LQP for costs .

Theorem 1.6.

Every deterministic -round -LQP for costs .

The former result is straightforward, based on the simple observation that such an LQP can extract the entire input followed by basic information theoretic considerations. Incidentally, the problem of extracting all of using -LQPs has a long history as the coin weighing problem, for which a 1-round algorithm exists; see Section 1.2.

The significant result here is the latter. It again uses a round elimination strategy and, as before, the bird’s-eye view is that we identify disjoint blocks of indices to engineer a suitable lifting. This time, the blocks arise out of extremal combinatorics considerations, specifically the sunflower lemma, in its recently strengthened form [Rao20]. Furthermore, upon carrying out this round elimination, we are left with a -round LQP that solves elemx only under a cardinality constraint on the input set. To finish the proof, we must demonstrate hardness even for this special case. This is not as straightforward as Proposition 1.5: our argument to handle this hinges on the Frankl–Wilson theorem [FW81] on set systems with forbidden intersection sizes.

Attempts to extend the above proof outline to handle more than two rounds runs into technical issues of integer divisibility. We suspect that this is an artifact of our proof machinery and not inherent to the problem. We conjecture that every deterministic -round -LQP requires cost , suppressing polylogarithmic factors. Indeed, we believe that much more is true, and that a communication complexity analogue of such a tradeoff also holds. We shall take this up after a discussion of related work.

1.2 Related Work and Connections

Our work touches upon several themes with long histories of study in computer science: determinism versus randomization, adaptivity versus non-adaptivity, sublinear algorithms, and input access through structured queries. With these connections in mind, we recall a small number of works that are either close in spirit to ours or shed light on some aspect of this work.

The most basic query model is the Boolean decision tree. In this setting, deterministic and randomized complexities are polynomially related for total Boolean functions [BdW02, ABB17], whereas arbitrarily large gaps are possible for search problems [LNNW95]. Parity decision trees—equivalent to our -LQPs—have been studied in several works (e.g., [ZS10, HHL18] and the references therein), usually for Boolean functions and focusing on connections with communication complexity of XOR-composed functions. Beyond the Boolean—or indeed the discrete—setting lie linear decision trees, where the input is a real vector and one can query the sign of a linear form [DL78, KLM19]. All such “decision tree” models are fully adaptive and the vast majority of works using them do not focus on amount of adaptivity as a resource.

At the other extreme is the (nonadaptive) linear sketching model, where a high-dimensional input is accessed through one batch of linear queries (equivalently, through a low-dimensional sketch of it produced by a sketching matrix). This paradigm is ubiquitous in data streaming algorithms and compressed sensing [Mut05, Don06, Woo14, CY20] and has connections to dimension reduction and metric embeddings. Some recent work carries the message that linear sketching might be a complete paradigm for a large class of data streaming algorithms [LNW14] and certain communication protocols [KMSY18, HLY19]. Most work on linear sketching considers randomized sketches, since determinism often precludes sublinear cost.

Turning to determinism, the well-studied coin weighing problem, put in our terms, asks for a -LQP that retrieves the entire input . It has long been known that nonadaptive queries are necessary and sufficient. Special cases and variants of this problem have been studied over the years; see [ER63] for some early history and [Bsh09, MK13] for recent history. While some of these works consider adaptive LQPs, there is no strong rounds-vs-queries tradeoff for this problem, which is harder than elemx.

The body of work on round complexity under linear queries is much smaller. There is one recent work very close to ours: Assadi, Chakrabarty, and Khanna [ACK20] studied a problem very similar to elemx that they called single-element-recovery, where the input is a vector , and by applying -linear queries one wishes to recovery an arbitrary element from the support of . While their query model is much stronger than our -linear or -linear queries, it is balanced by the -valued inputs that prevent tricks to recover the entire input in one query. Their main theorem implies that the deterministic -round search algorithm making roughly queries in total—very similar to Algorithm 1—has cost exactly matching the lower bound. Linear queries and adaptivity are also featured together in some work on sparse recovery problems. One such problem is to find an approximately closest -sparse vector to an input , using -linear queries to the input and rounds of adaptivity. For this, [PW13] have proven near optimal lower bounds of when and [KP19] have extended them to small , proving queries are needed when .

A number of works consider rounds of adaptivity in query models beyond linear queries. Recent examples include works on maximizing submodular functions through adaptive oracle queries [BS18]; on adaptivity hierarchy theorems in property testing [CG18]; on identifying biased coins through pairwise comparisons or in multi-armed bandit settings [AAAK17]; and on finding approximately maximum bipartite matchings through demand queries and OR-queries [Nis21]. Other works have studied adaptivity in the massive parallel communication/computation (MPC) model [BKS17] and in various graph query models [AB19, BHPSR20].

A rich body of work on cost/adaptivity tradeoffs is found in communication complexity, where adaptivity manifests as rounds of interaction. An early work [NW93] gave exponential separations between and rounds for all and introduced a round elimination paradigm that remains ubiquitous to this day. This work also explains how an earlier result [KW90] connecting circuit and communication complexities can be used to relate bounded-round communication complexity for a specific problem to the size of bounded-depth, unbounded fan-in formulas. More work has been spurred by applications of bounded-round communication lower bounds in data structures, where they provide lower bounds in the cell-probe model [MNSW98, Sen03, CR04, PT07, LPY16]; and in streaming algorithms, where they translate naturally to tradeoffs between the number of passes made over the input stream and the working memory required [GM09, ER14, GO16, CCM16, CW16]. In much of this body of work, round elimination is performed using information theoretic arguments that naturally provide lower bounds against randomized algorithms.

In contrast, it is rare to see deterministic tradeoffs where corresponding randomized ones do not hold because randomization makes the problem “too easy.” This is exactly the situation with elemx, as shown by this work in the context of the randomized upper bounds (Section 6) via -sampling [JST11]. In light of the preceding discussion, our instantiations of round elimination must use techniques beyond Shannon-style information theory. They indeed do. Our techniques therefore have the potential for further use in separating determinism from randomization in this fine-grained (round aware) sense.

Our query complexity results on elemx suggest a tantalizing communication complexity analogue. Let denote555The notation, due to Nelson and Yu [NY19], is to be read as “universal relation with a subset constraint.” the communication complexity problem where Alice and Bob receive sets respectively with the promise that , and their goal is to produce an element in . Clearly, a -round query protocol for elemx making queries, with each answer lying in a set of size , provides a -round communication protocol for using at most bits. Therefore, our results here would be subsumed, in an asymptotic sense, if one could resolve the following conjecture positively.

Conjecture 1.7.

Every deterministic -round communication protocol for costs bits, suppressing polylogarithmic factors.

We find the above conjecture compelling because it would demonstrate a new phenomenon in communication complexity, where a problem is easy for one-round randomized and for interactive deterministic protocols, but exhibits a nontrivial tradeoff for bounded-round deterministic ones.

In passing, we note that the problem was introduced in [KNP17] where its randomized communication complexity was studied. The randomized lower bound was subsequently used by Nelson and Yu [NY19] to prove the optimality of Ahn, Guha, and McGregor’s graph sketching algorithm for graph connectivity [AGM12]. An outstanding open question about the latter problem (viewed as a communication problem where players, each holding a vertex neighborhood, talk to a coordinator who determines whether the graph is connected) is whether it admits a deterministic algorithm with sublinear communication. A better understanding of in the deterministic setting could be key to addressing this question.

There are also two problems similar to for which lower bounds have already been proven. The universal relation problem ur gives Alice and Bob unequal sets and asks them to produce an element . This has deterministic communication complexity  [TZ97]. The Karchmer-Wigderson game for is the problem ur with the additional constraints that be even and

be odd; existing circuit complexity results 

[Has86, Ros15] imply, as briefly explained in Section 7, that -round deterministic communication protocols for this require bits of communication.

2 Preliminaries

Throughout the paper, we shall freely switch between the equivalent viewpoints of sets in and vectors in , using the notational convention that when an uppercase letter (e.g., ) denotes a set, the corresponding lowercase boldface letter (e.g., ) denotes the characteristic vector of that set and vice versa.

2.1 Various Definitions

The search problem was already formally defined in eq. 1. We shall also work with special cases of this problem, where the cardinality of the input set is further restricted in some way. These are formalized as follows: we define

(3)
(4)
(5)
Definition 2.1 (Protocol).

Let be a search problem with input space and output space . A deterministic -round -linear query protocol (-LQP), , on this input space is a rooted tree of depth where each internal node is labeled with a matrix ; each leaf node with an output ; and the edges from a node to its children are labeled with the elements of bijectively. The quantity of node is the cost of the node, sometimes also denoted . Given an input , the measurement at internal node is . The transcript of on —denoted —is the unique root-to-leaf path obtained by walking along the edges determined by these measurements; the th measurement is the label of the th edge on this path; and the output is the label of the leaf reached by this path. We say that solves if for every input .

Since this paper is largely focused on deterministic complexity, henceforth we shall assume that all LQPs are deterministic unless stated otherwise.

Definition 2.2 (Cost).

The query cost of a protocol is:

which is, informally, the number of linear queries performed when executes on . While we do not focus on bit complexity in this paper, it is worth noting that to make an information-theoretically fair comparison between different domains, one should consider the number of bits returned in response to all the queries. This number may be larger than , though only by an factor for with , and not at all for .

Definition 2.3 (Complexity).

The -linear query complexity and -round -linear query complexity of a search problem are defined, respectively, to be

2.2 Useful Results from Combinatorics

In the course of this paper, we will use several important theorems from combinatorics. For results on -LQPs (proved in Section 4), we use the following result of van Emde Boas and Kruyswijk [vEBK69] on zero sumsets, slightly reworded to use modern notation.

Theorem 2.4 ([vEBK69]).

Let be a finite abelian group with exponent666The exponent of a group is the least common multiple of the orders of its elements. and order . Let be the minimal positive integer for which any sequence of elements from has a nonempty subsequence which sums to zero. Then . ∎

A stronger result that applies when and is a prime power [Ols69]; it is conjectured that the prime-power constraint is unnecessary [GG06, conjecture 3.5].

When working over (in Section 5), we use the well-known notion of a sunflower and the following recent result of Rao [Rao20], which refines the noted result of Alweiss, Lovett, Wu, and Zhang [ALWZ20] that improved the classic sunflower lemma of Erdős and Rado [ER60]. The note [BCW20] further improves Rao’s bound by replacing the factor with , but this will not affect our proof. Tao [Tao20] gives an alternative presentation of Rao’s result which may be simpler to follow.

In a different part of our argument, we will need a well known theorem of Frankl and Wilson [FW81].

Theorem 2.5 (Rao).

There is a universal constant such that every family of more than sets, each of cardinality , must contain a -sunflower, defined as a family of distinct sets whose pairwise intersections are identical. ∎

Theorem 2.6 (Frankl–Wilson).

Let be the largest size of a collection of subsets of for which no two elements have intersection size . Then, if is a prime power:

2.3 Our Round Elimination Framework

We now describe a framework for our round elimination arguments. For this section, we shall work over a general ring (with unity), , and “LQP” will mean a -LQP where . Fix this ring .

Definition 2.7 (Homomorphism and shadowing).

A protocol homomorphism is a map from a protocol to a protocol such that (i) for any two nodes in , the node is a child of iff is a child of , and (ii)  maps leaves of to leaves of . We say that is cost-preserving for each internal node of , . We say that shadows through if is injective, cost-preserving, and maps the root of to a child of the root of . Notice that when this is the case, is one round shorter than .

Suppose we have an LQP that operates on inputs in and produces outputs in . Further, suppose is a collection of pairwise disjoint nonempty sets. We then define a certain LQP operating on inputs in and producing outputs in . To aid intuition, we describe the construction procedurally in Algorithm 1.

1:Lift our input to (this step is only conceptual).
2:Mimic by simulating the queries it would have made to its input . Emulate each such query by making the corresponding query to our own input . This is indeed possible using linear queries to .
3:Suppose wants to output . If , then output that index (which must be unique); otherwise, output an arbitrary index.
Algorithm 1  Outline of protocol

To define formally, we first define the lifting matrix

(6)

whose entries lie in and which maps the input space of to the input space of according to creftype 1, thanks to the pairwise disjointness of the sets . At a given node of , labeled with , the simulation in creftype 2 would retrieve the measurement . The protocol can get the same result by making the query .

Thus, the protocol tree for is formed as follows. Prepare a copy of and let be the natural bijection between their nodes. Label each internal node of with . Copy over all edge labels from to . For each leaf of , if , then assign label . If no such exists, assign (say). This labeling is well defined because of the pairwise disjointness of the sets .

In the sequel, to perform round elimination, we shall use the construction of in a special way that we record in the lemma below. We also record a definition that will be relevant when invoking the lemma.

Lemma 2.8.

Suppose that correctly solves on inputs in . Let be pairwise disjoint and let be defined by eq. 6. Let be the root node of and, for , let and . Then, there is a protocol that shadows and correctly solves on each input in .

Proof.

Using the above setup and terminology, construct as in Algorithm 1. The given conditions imply that on all inputs in , the first measurement of is always and thus leads an execution of to a particular child, , of its root node. Thus, we can shrink to the subprotocol rooted at . Notice that the bijection is a cost-preserving protocol homomorphism and so shadows through .

By construction, on input simulates on , an input on which correctly solves . Therefore, if outputs , then . By the disjointness guarantee, there exists a unique for which . As reports precisely this , it correctly solves on . ∎

Definition 2.9 (Uniform family).

Fix a matrix . An -uniform family of size is a collection of pairwise disjoint sets such that , for some vector .

3 Linear Queries Modulo 2

We begin our study of the element-extraction problem by considering -linear queries. As noted in Section 1.1, we shall later generalize the results to , but we feel it is worth seeing our framework in action in the especially clean setting of . We begin by showing that the additional promise of odd cardinality on the input set is crucial, or else there is no interesting rounds-vs-queries tradeoff to be had.

Proposition 3.1 (Restatement of Proposition 1.1).

.

Proof.

The upper bound is achieved by the trivial -round LQP (i.e., a sketch) that queries all but one of the individual bits of the input.

Now assume to the contrary that there is a -LQP with that solves . Let be the matrix whose rows represent all queries along the path . Then , whence there exist distinct nonzero vectors such that . Setting , we also have . Thus, the three nonzero inputs lead to the same leaf, namely , and produce the same output , say. By the correctness of , we have , which contradicts . ∎

Accordingly, for the rest of this section, we focus on the problem , as defined in eq. 3. We shall prove Theorem 1.3 using a round elimination technique. As discussed in Section 2, this round elimination will be enabled by identifying a certain -uniform family. The next lemma, establishing a useful fact about matrices over , will provide us this family.

Lemma 3.2.

Every matrix , admits an -uniform family of size such that each cardinality is odd.

Proof.

Let be the (nonzero) column vectors of the matrix

formed by appending the all-ones row to . For each , let be the collection of column vectors and let be the linear subspace of spanned by the vectors in .

Partition into nonempty disjoint sets iteratively, as follows. For each , let be a maximal subset of such that the vectors in are linearly independent. Since these vectors live in , it follows that . We stop when , implying .

We claim that, for each , we have . Indeed, if there exists an element , then there is a set for which . Since is closed under linear combinations and does not contain , there exists with . By construction, , so was not included in despite being available. This contradicts the maximality of .

Let be an index in . Then , so there must exist subsets of for which . The sets are pairwise disjoint because the sets are. Let be the first coordinates of ; then for all , . Therefore, is -uniform. Finally, since the last coordinate of is and the last row of is , for each , , so is odd. ∎

Lemma 3.3 (Round elimination lemma).

Let be a deterministic -round -LQP for , where . Then there exists a deterministic -round -LQP for , such that

shadows through a (cost-preserving, injective) protocol homomorphism ;

, where is the cost of the root of .

Proof.

Let be the label of the root of . Let be an -uniform family of size with each odd, as guaranteed by Lemma 3.2. Let the lifting matrix be as given by eq. 6 and let . We know that correctly solves on inputs in odd. Invoking Lemma 2.8, we obtain a -round -LQP that shadows as required.

It remains to show that solves . The guarantee of Lemma 2.8 is that correctly solves on the input set defined there. Thus, it suffices to show that if an input satisfies the promise of —i.e., is odd—then . We reason as follows:

and

This completes the proof, by definition of . ∎

The next step of the proof is to repeatedly invoke the above round elimination lemma and carefully control parameters. To perform a sharp analysis, we introduce the following concept.

Definition 3.4.

A division sequence for is a finite sequence of positive integers for which

(7)
Lemma 3.5.

Let be a division sequence for minimizing . Then

Proof.

For the upper bound, let . For the lower bound, remove the ceiling operations in eq. 7 to get

By the AM-GM inequality,

This brings us to the main result of this section: a rounds-vs-queries tradeoff.

Theorem 3.6 (Restatement of Theorem 1.3).

.

Proof.

Suppose that is a deterministic -round -LQP for . Repeatedly applying Lemma 3.3, we obtain a sequence of protocols , which solve on progressively smaller input sizes, until is a degenerate depth-0 protocol (in which no queries occur).

Let be the cost of the root of , for . As Lemma 3.3 gives protocol homomorphisms , we find the the roots of each correspond to nodes in . In fact, the vertices form a path from the root of to the leaf . The inputs of lift to inputs of which reach . Lower bounding the query cost of using this branch gives

(8)

Using lemma 3.3 repeatedly, must solve , for some integer

However, as solves without performing any queries, there must be a fixed index which is a valid output for all inputs of odd size. This is only possible when ; for any larger , the inputs and must produce different outputs.

Therefore, the integers form a division sequence for . Applying Lemma 3.5 to eq. 8,

where the last inequality follows from the fact that for all . ∎

4 Linear Queries Modulo q

First, we use Theorem 2.4 to show that is hard for -LQPs.

Proposition 4.1 (Restatement of Proposition 1.2).

For every , we have .

Proof.

This is proven with the same strategy as for Proposition 1.1. Assume for sake of contradiction that . Let be the leaf . Let be the matrix containing all queries along the path from the root of to .

By Theorem 2.4, since the group has order , and exponent , any sequence of elements in has a nontrivial subsequence summing to . As , . Thus, since , picking disjoint subsets and of sizes each, and applying the theorem implies there exist disjoint nonempty subsets and of for which the corresponding columns of sum to . In other words, reaches the same leaf given and , but the leaf cannot be assigned an output consistent with both. ∎

A similar strategy proves a lemma analogous to Lemma 3.2:

Lemma 4.2.

Every matrix , admits an -uniform family where

, and

each cardinality .

Proof.

To be able to enforce constraints on the values , we define , and let be its column vectors. We partition the columns of the matrix into disjoint subsets of by the following iterative procedure. In the procedure, let be the set of indices of not yet chosen. Each set starts out as ; then beginning with , each set is expanded by picking an index from for which no subset has the property that ; adding to and removing from ; until no more such indices can be found. When is done, start filling , etc.

When , each corresponds to a basis of a subspace of , so . For , we apply Theorem 2.4, using the fact that the group has order and exponent . The maximum possible size of each set is then . The upper bound also holds here. Consequently, the number of sets formed is . Pick some ; for any , since was not picked when was constructed, it must be the case that there is a subset for which . This implies . Since the first row of is , we have , so all the sets have size . Let be the last entries of ; then for all , . There are sets in total. ∎

Compared to , there is a slight weakening of the main round elimination lemma, which is a direct consequence of the weakened Lemma 4.2. Instead of directly lower bounding the cost of , we prove separate lower bounds for each , for all , and take their maximum. The search problem is with the additional promise that the input set has size .

Lemma 4.3 (Round elimination lemma).

Let be a -round -LQP for , where and . Then there exists a -round -LQP for , such that

shadows through a protocol homomorphism ;

, where is the cost of the root of .

Proof.

Let be the label of the root of . Lemma 4.2 guarantees that there exists an -uniform family of size , where satisfies lemma 4.3, and , and . Let be the lifting matrix from eq. 6, and . Applying Lemma 2.8 to , and , we obtain a -round -LQP that shadows , and solves on all inputs for which and . If fulfills the promise of , that , then: