Hard properties with (very) short PCPPs and their applications

We show that there exist properties that are maximally hard for testing, while still admitting PCPPs with a proof size very close to linear. Specifically, for every fixed ℓ, we construct a property P^(ℓ)⊆{0,1}^n satisfying the following: Any testing algorithm for P^(ℓ) requires Ω(n) many queries, and yet P^(ℓ) has a constant query PCPP whose proof size is O(n·log^(ℓ)n), where log^(ℓ) denotes the ℓ times iterated log function (e.g., log^(2)n = loglog n). The best previously known upper bound on the PCPP proof size for a maximally hard to test property was O(n ·polylogn). As an immediate application, we obtain stronger separations between the standard testing model and both the tolerant testing model and the erasure-resilient testing model: for every fixed ℓ, we construct a property that has a constant-query tester, but requires Ω(n/log^(ℓ)(n)) queries for every tolerant or erasure-resilient tester.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

05/03/2018

Lower Bounds for Tolerant Junta and Unateness Testing via Rejection Sampling of Graphs

We introduce a new model for testing graph properties which we call the ...
11/16/2019

Approximating the Distance to Monotonicity of Boolean Functions

We design a nonadaptive algorithm that, given a Boolean function f{0,1}^...
10/19/2021

Exploring the Gap between Tolerant and Non-tolerant Distribution Testing

The framework of distribution testing is currently ubiquitous in the fie...
07/20/2020

Tolerant Distribution Testing in the Conditional Sampling Model

Recently, there has been significant work studying distribution testing ...
10/12/2020

New Sublinear Algorithms and Lower Bounds for LIS Estimation

Estimating the length of the longest increasing subsequence (LIS) in an ...
09/17/2021

Sublinear-Time Computation in the Presence of Online Erasures

We initiate the study of sublinear-time algorithms that access their inp...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Probabilistically checkable proofs (s) are one of the landmark achievements in theoretical computer science. Loosely speaking, s are proofs that can be verified by reading only a very small (i.e., constant) number of bits. Beyond the construction of highly efficient proof systems, s have myriad applications, most notably within the field of hardness of approximation.

A closely related variant of s, called probabilistically checkable proofs of proximity (s), was introduced independently by Ben-Sasson et al. [BGH06] and Dinur and Reingold [DR06]. In the setting, a verifier is given oracle access to both an input and a proof . It should make a few (e.g., constant) number of queries to both oracles to ascertain whether . Since the verifier can only read a few of the input bits, we only require that it rejects inputs that are far (in Hamming distance) from , no matter what proof is provided. s are highly instrumental in the construction of standard s. Indeed, using modern terminology, both the original algebraic construction of s [ALM98] (see also [BGH06]) as well as Dinur’s [Din07] combinatorial proof utilize s.

By combining the seminal works of Ben-Sasson and Sudan [BSS08] and Dinur [Din07], one can obtain s and s with only poly-logarithmic (multiplicative) overhead. More specifically, the usual benchmark for s is with respect to the problem, in which the verifier is given explicit access to a circuit and oracle access to both an input and a proof , and needs to verify that is close to the set . The works of [BSS08, Din07] yield a whose length is quasilinear in the size of the circuit .111Note that a for can be easily used to construct a for with similar overhead (see [BGH06, Proposition 2.4]).

Given the important connections both to constructions of efficient proof-systems, and to hardness of approximation, a central question in the area is whether this result can be improved: Do s with only a constant overhead exist? In a recent work, Ben Sasson et al. [BKK16] construct s with constant overhead, albeit with very large query complexity (as well as a non-uniform verification procedure).222Although it is not stated in [BKK16], we believe that their techniques can also yield s with similar parameters. To verify that the verifier needs to make queries, where can be any fixed constant.

Given the lack of success (despite the significant interest) in constructing constant-query s with constant overhead, it may be the case that there exist languages that do not have such efficient s. A natural class of candidate languages for which such s may not exist are languages for which it is maximally hard to test whether or is far from such, without a proof. In other words, languages (or rather properties) that do not admit sub-linear query testers. Thus, we investigate the following question:

Supposing that requires queries for every (property) tester, must any constant-query for have proof length ?

1.1 Our Results

Our first main result answers the above question negatively, by constructing a property that is maximally hard for testing, while admitting a very short . For the exact theorem statement, we let denote the times iterated function. That is, for and .

Theorem 1.1 (informal restatement of Theorem 5.2).

For every constant integer , there exists a property such that any testing algorithm for requires many queries, while admits a (constant query) system with proof length .

We remark that all such maximally hard properties cannot have constant-query proof-systems with a sub-linear length proof string (see Proposition 2.10), leaving only a small gap of on the proof length in Theorem 1.1.

Beyond demonstrating that s with extremely short proofs exist for some hard properties, we use Theorem 1.1 to derive several applications. We proceed to describe these applications next.

Tolerant Testing.

Recall that property testing (very much like s) deals with solving approximate decision problems. A tester for a property is an algorithm that given a sublinear number of queries to its input

, should accept (with high probability) if

and reject if is far from (where, unlike s, the tester is not provided with any proof).

The standard setting of property testing is arguably fragile, since the testing algorithm is only guaranteed to accept all functions that exactly satisfy the property. In various settings and applications, accepting only inputs that exactly have a certain property is too restrictive, and it is more beneficial to distinguish between inputs that are close to having the property, and those that are far from it. To address this question, Parnas, Ron and Rubinfeld [PRR06] introduced a natural generalization of property testing, in which the algorithm is required to accept functions that are close to the property. Namely, for parameters , an -tolerant testing algorithm is given an oracle access to the input, and is required to determine (with high probability) whether a given input is -close to the property or whether it is -far from it. As observed in [PRR06], any standard testing algorithm whose queries are uniformly (but not necessarily independently) distributed, is inherently tolerant to some extent. Nevertheless, for many problems, strengthening the tolerance requires applying advanced methods and devising new algorithms (see e.g., [FN07, KS09, CGR13, BMR16, BCE18]).

It is natural to ask whether tolerant testing is strictly harder than standard testing. This question was explicitly studied by Fischer and Fortnow [FF06], who used s with polynomial size proofs to show that there exists a property that admits a tester with constant query complexity, but such that every tolerant tester for has query complexity for some . Using modern quasilinear [BSS08, Din07] in combination with the techniques of [FF06] it is possible to construct a property demonstrating a better separation, of constant query complexity for standard testing versus for tolerant testing.

Using Theorem 1.1 we can obtain an improved separation between testing and tolerant testing:

Theorem 1.2 (informal restatement of Theorem 6.1).

For any constant integer , there exist a property of boolean strings and a constant such that is -testable for any with a number of queries independent of , but for any , every -tolerant tester for requires many queries.

Erasure-Resilient Testing.

Another variant of the property testing model is the erasure-resilient testing model. This model was defined by Dixit et. al. [DRTV18] to address cases where data cannot be accessed at some domain points due to privacy concerns, or when some of the values were adversarially erased. More precisely, an -erasure-resilient -tester gets as input parameters , as well as oracle access to a function , such that at most an fraction of its values have been erased. The tester has to accept with high probability if there is a way to assign values to the erased points of such that the resulting function satisfies the desired property. The tester has to reject with high probability if for every assignment of values to the erased points, the resulting function is still -far from the desired property.

Similarly to the tolerant testing scenario, s were also used in [DRTV18] to show that there exists a property of boolean strings of length that has a tester with query complexity independent of , but for any constant , every -erasure-resilient tester is required to query many bits for some , thereby establishing a separation between the models. Later, in [RRV19] constructions were used to provide a separation between the erasure-resilient testing model and the tolerant testing model.

Similarly to the tolerant testing case, we use Theorem 1.1 to prove a stronger separation between the erasure-resilient testing model and the standard testing model.

Theorem 1.3 (informal restatement of Theorem 6.2).

For any constant integer , there exist a property of boolean strings and a constant such that is -testable for any with number of queries independent of , but for any and such that , any -erasure-resilient -tester is required to query many bits.

Secret Sharing applications.

As an additional application of our techniques we also obtain a new type of secret sharing scheme. Recall that in a secret sharing scheme, a secret value is shared between parties in such a way that only an authorized subset of the users can recover the secret. We construct a secret sharing scheme in which no subset of parties can recover the secret and yet it is possible for each one of the parties to recover the secret, if given access to a -like proof, with the guarantee that no matter what proof-string is given, most parties will either recover or reject.

We obtain such a secret sharing scheme through a notion called Probabilistically Checkable Unveiling of a Shared Secret (), which will be central in our work. This notion is loosely described in Subsection 1.2 and formally defined in Section 4.

1.2 Techniques

Central to our construction are (univariate) polynomials over a finite field . A basic fact is that a random polynomial of degree (say) , evaluated at any set of at most points, looks exactly the same as a totally random function . This is despite the fact that a random function is very far (in Hamming distance) from the set of low degree polynomials. Indeed, this is the basic fact utilized by Shamir’s secret sharing scheme [Sha79].

Thus, the property of being a low degree polynomial is a hard problem to decide for any tester, in the sense that such a tester must make queries to the truth table of the function in order to decide. Given that, it seems natural to start with this property in order to prove Theorem 1.1. Here we run into two difficulties. First, the property of being a low degree polynomial is defined over a large alphabet, whereas we seek a property over boolean strings. Second, the best known s for this property have quasi-linear length [BSS08], which falls short of our goal.

To cope with these difficulties, our approach is to use composition, or more accurately, an iterated construction. The main technical contribution of this paper lies in the mechanism enabling this iteration. More specifically, rather than having the property contain the explicit truth table of the low degree polynomial , we would like to use a more redundant representation for encoding each value . This encoding should have several properties:

  • It must be the case that one needs to read (almost) the entire encoding to be able to decode . This feature of the encoding, which we view as a secret-sharing type of property, lets us obtain a hard to test property over boolean strings.

  • The encoding need not be efficient, and in fact it will be made long enough to eventually subsume the typical length of a proof-string for the low degree property, when calculated with respect to an unencoded input string.

  • Last but not least, we need the value to be decodable using very few queries, when given access to an auxiliary -like proof string. This would allow us to “propagate” the verification of the property across iterations.

In more detail, we would like to devise a (randomized) encoding of strings in by strings in . The third requirement listed above can be interpreted as saying that given oracle access to and explicit access to a value , it will be possible verify that indeed encodes using a -like scheme, i.e. by providing a proof that can be verified with a constant number of queries. We refer to this property as a probabilistically checkable unveiling ()333In fact, we will use a stronger variant where the access to is also restricted.. Note that in our setting a single value may (and usually will) have more than one valid encoding.

Going back to the first requirement of the encoding, we demand that without a proof, one must query at least bits of to obtain any information about the encoded , or even discern that is indeed a valid encoding of some value. Given this combination of requirements, we refer to the verification procedure as a Probabilistically Checkable Unveiling of a Shared Secret ().

Low degree polynomials can be used to obtain a based on Shamir’s secret sharing scheme. More specifically, to encode a bit string , we take a random polynomial whose values on a subset are exactly equal to the bits of . However, we provide the values of this polynomial only over the sub domain

. Then, the encoded value is represented by the (interpolated) values of

over , which admit a scheme. On the other hand, the “large independence” feature of polynomials makes the encoded value indiscernible without a a supplied proof string, unless too many of the values of over are read, thus allowing for a .

This construction can now be improved via iteration. Rather than explicitly providing the values of the polynomial, they will be provided by a scheme. Note that the scheme that we now need is for strings of a (roughly) exponentially smaller size. The high level idea is to iterate this construction times to obtain the iterated log function in our theorems.

At the end of the recursion, i.e., for the smallest blocks at the bottom, we utilize a linear-code featuring both high distance and high dual distance, for a polynomial size of the encoded value. This is the only “non-constructive” part in our construction, but since the relevant block size will eventually be less than , the constructed property will still be uniform with polynomial calculation time (the exponential time in , needed to construct the linear-code matrix, becomes negligible).

Our in particular provides a property that is hard to test (due to its shared secret feature), and yet has a near-linear through its unveiling, thereby establishing Theorem 1.1. We utilize this property for separation results in a similar manner to [FF06] and [DRTV18], by considering a weighted version of a “ with proof” property, where the proof part holds only a small portion of the total weight. The proof part enables a constant query test, whereas if the proof is deleted

1.3 Related work

Short s.

For properties which can be verified using a circuit of size , [BGH06] gave constructions with proof length and with a query complexity of , as well as slightly longer proofs with constant query complexity. Later, Ben-Sasson and Sudan [BSS08] gave constructions with quasilinear size proofs, but with slightly higher query complexity. The state of the art construction is due to Dinur [Din07] who, building on [BSS08], showed a construction with proof length that is quasilinear in the circuit size and with constant query complexity. In a recent work Ben Sasson et al. [BCG17] constructed an interactive version of s [BCS16, RRR16] of strictly linear length and constant query complexity.

Tolerant Testing.

The tolerant testing framework has received significant attention in the past decade. Property testing of dense graphs, initiated by [GGR98], is inherently tolerant by the canonical tests of Goldreich and Trevisan [GT03]. Later, Fischer and Newman [FN07] (see also [BEF18]) showed that every testable (dense) graph property admits a tolerant testing algorithm for every , which implies that query complexity testability is equivalent to distance approximation in the dense graph model. Some properties of boolean functions were also studied recently in the tolerant testing setting. In particular, the properties of being a -junta (i.e. a function that depends on variables) and being unate (i.e., a function where each direction is either monotone increasing or monotone decreasing) [BCE18, LW19, DMN19].

Erasure-resilient Testing.

For the erasure resilient model, in addition to the separation between that model and the standard testing model, [DRTV18] designed efficient erasure-resilient testers for important properties, such as monotonicity and convexity. Shortly after, in [RRV19] a separation between the erasure-resilient testing model and the tolerant testing model was established. The last separation requires an additional construction (outside s), which remains an obstacle to obtaining better than polynomial separations.

2 Preliminaries

We start with some notation and central definitions. For a set , we let denote the power-set of . For two strings we use to denote string concatenation.

For an integer , a field and , we let denote the binary representation of in some canonical way.

For two sets of strings and we use to denote the set . For a collection of sets we use to denote the set of all possible concatenations , where for every .

Throughout this paper we use boldface letters to denote random variables, and assume a fixed canonical ordering over the elements in all the sets we define. For a set

, we write to denote a random variable resulting from a uniformly random choice of an element .

2.1 Error correcting codes and polynomials over finite fields

The relative Hamming distance of two strings is defined as . For a string and a non-empty set , we define . The following plays a central role in many complexity-related works, including ours.

Definition 2.1.

A code is an injective function . If is a finite field and is a linear function (over ), then we say that is a linear code. The rate of is defined as , whereas the minimum relative distance is defined as the minimum over all distinct of .

An -distance code is a code whose minimum relative distance is at least . When for a fixed we have a family of -distance codes (for different values of ), we refer to its members as error correcting codes.

In this work we use the fact that efficient codes with constant rate and constant relative distance exist. Moreover, there exist such codes in which membership can be decided by a quasi-linear size Boolean circuit.

Theorem 2.2 (see e.g., [Spi96]).

There exists a linear code with constant relative distance, for which membership can be decided by a size Boolean circuit.

Actually, the rate of the code in [Spi96] is significantly better, but since we do not try to optimize constants, we use the constant solely for readability. In addition, the code described in [Spi96] is linear time decodeable, but we do not make use of this feature throughout this work.

We slightly abuse notation, and for a finite field of size , view the encoding given in Theorem 2.2 as , by associating with in the natural way. Note that for , it holds that for every , and therefore . We slightly abuse notation, and for a function we write to denote the length bit string (where we use the canonical ordering over ).

Definition 2.3.

Let denote the set of polynomials such that .

The following lemma of [Hor72], providing a fast univariate interpolation, will be an important tool in this work.

Lemma 2.4 ([Hor72]).

Given a set of pairs with all distinct, we can output the coefficients of of degree at most satisfying for all , in additions and multiplications in .

The next lemma states that a randomly chosen function is far from any low degree polynomial with very high probability.

Lemma 2.5.

With probability at least , a uniformly random function is -far from .

Proof: Consider the size of a ball of relative radius around some function in the space of functions from to itself. The number of points (i.e., functions from ) contained in this ball is at most

By the fact that the size of is , the size of the set of points that are at relative distance at most from any point in is at most

The lemma follows by observing that there are functions from to itself.     

2.1.1 Dual distance of linear codes

We focus here specifically on a linear code , and consider the linear subspace of its image, . We define the distance of a linear space as , and note that in the case of being the image of a code , this is identical to . For a linear code, it helps to investigate also dual distances.

Definition 2.6.

Given two vectors

, we define their scalar product as , where multiplication and addition are calculated in the field . Given a linear space , its dual space is the linear space . In other words, it is the space of vectors who are orthogonal to all members of .The dual distance of the space is simply defined as .

For a code , we define its dual distance, , as the dual distance of its image . We call an -dual-distance code if . The following well-known lemma is essential to us, as it will relate to the “secret-sharing” property that we define later.

Lemma 2.7 (See e.g., [Ms77, Chapter , Theorem ]).

Suppose that is a linear -dual distance code, let be any set of size less than , and consider the following random process for picking a function : Let be drawn uniformly at random, and set be the restriction of to the set . Then, the distribution of

is identical to the uniform distribution over the set of all functions from

to .

2.2 Probabilistically checkable proofs of proximity ()

As described briefly in the introduction, a verifier for a property is given access to an input and a proof , as well as a detection radius and soundness error . The verifier should make a constant number of queries (depending only on ) to the input and the proof , and satisfy the following. If , then there exists for which the verifier should always accept . If , the verifier should reject with probability at least , regardless of the contents of . More formally, we define the following.

Definition 2.8 ().

For , let be a property of -bit Boolean strings, and let . We say that has a -query, length- Probabilistically Checkable Proof of Proximity () system if the following holds: There exists a verification algorithm that takes as input and , makes a total of queries on strings and , and satisfies the following:

  1. (Completeness) If , then there exists a proof such that for every , the verifier accepts with probability .

  2. (Soundness) If , then for every alleged proof , the verifier rejects with probability greater than .

The following lemma, establishing the existence of a quasilinear for any property that is verifiable in quasilinear time, will be an important tool throughout this work.

Lemma 2.9 (Corollary 8.4 in [Din07], see also [Gm07]).

Let be a property of Boolean strings which is verifiable by a size Boolean circuit. Then, there exists a length- system for , that for every makes at most queries, where .

As described briefly in the introduction, maximally hard properties cannot have a constant query proof systems with a sublinear length proof string.

Proposition 2.10.

Let and be such that any -tester for has to make many queries. Then, any constant query system for (for e.g., where ) must have proof length of size .

Proof:  Suppose that there exists a for with queries and proof length . Since the verifier has constant query complexity, we may assume that it is non adaptive and uses queries. By a standard amplification argument, we can construct an amplified verifier that makes queries, with soundness . By the fact that the verifier is non-adaptive, it has the same query distribution regardless of the proof string. Therefore, we can run amplified verifiers in parallel while reusing queries, one verifier for each of the possible proof strings. If any of the amplified verifiers accept, we accept the input. If the input belongs to , one of the above verifiers will accept (the one that used the correct proof). If the input was -far from , then by a union bound, the probability that there was any accepting amplified verifier is at most . This yields an tester for , which contradicts our assumption.     

2.3 Testing, tolerant testing and erasure-resilient testing

In this subsection we define notions related to the property testing framework . We also formally define a few variants of the original testing model that will be addressed in this work.

A property of -bit boolean strings is a subset of all those strings, and we say that a string has the property if .

Given and a property , we say that a string is -far from if , and otherwise it is -close to . We next define the notion of a tolerant tester of which standard (i.e. intolerant) testers are a special case.

Definition 2.11 (Intolerant and tolerant testing).

Given , a -query -testing algorithm for a property is a probabilistic algorithm (possibly adaptive) making queries to an input that outputs a binary verdict satisfying the following two conditions.

  1. If , then accepts with probability at least .

  2. If , then rejects with probability at least .

When , we say that is an -testing algorithm for , and otherwise we say that is an -tolerant testing algorithm for .

Next, we define the erasure-resilient testing model. We start with some terminology. A string is -erased if is equal to on at most coordinates. A string that differs from only on coordinates for which is called a completion of . The (pseudo-)distance of an -erased string from a property is the minimum, over every completion of , of the relative Hamming distance of from . Note that for a string with no erasures, this is simply the Hamming distance of from . As before, is -far from if , and -close otherwise.

Definition 2.12 (Erasure-resilient tester).

Let and be parameters satisfying . An -query -erasure-resilient -tester for is a probabilistic algorithm making queries to an -erased string , that outputs a binary verdict satisfying the following two conditions.

  1. If (i.e., if there exists a completion of , such that ), then accepts with probability at least .

  2. If (i.e., if every completion of of is -far from ), then rejects with probability at least .

The next lemma will be useful to prove that some properties are hard to test. The lemma states that if we have two distributions whose restrictions to any set of queries of size at most are identical, then no (possibly adaptive) algorithm making at most queries can distinguish between them.

Definition 2.13 (Restriction).

Given a distribution over functions and a subset , we define the restriction of to to be the distribution over functions , that results from choosing a function according to , and setting to be , the restriction of to .

Lemma 2.14 ([Fns04], special case).

Let and be two distributions of functions over some domain . Suppose that for any set of size at most , the restricted distributions and are identically distributed. Then, any (possibly adaptive) algorithm making at most queries cannot distinguish from with any positive probability.

3 Code Ensembles

It will be necessary for us to think of a generalized definition of an encoding, in which each encoded value has multiple legal encodings.

Definition 3.1 (Code ensemble).

A code ensemble is a function . Namely, every has a set of its valid encodings from . We define the distance of the code ensemble as

It is useful to think of a code ensemble as a randomized mapping, that given , outputs a uniformly random element from the set of encodings . Using the above we can define a shared secret property. In particular, we use a strong information theoretic definition of a shared secret, in which bits do not give any information at all about the encoded value. Later on, we construct code ensembles with a shared secret property.

Definition 3.2 (Shared Secret).

For and a constant , we say that a code ensemble has a -shared secret property it satisfies the following. For any of size , any such that , and any it holds that

Namely, for any and any of size at most , the distribution obtained by choosing a uniformly random member of and considering its restriction to , is identical to the distribution obtained by choosing a uniformly random member of and considering its restriction to .

3.1 A construction of a hard code ensemble

We describe a construction of a code ensemble for which a linear number of queries is necessary to verify membership or to decode the encoded value. This code will be our base code in the iterative construction. The existence of such a code ensemble is proved probabilistically, relying on the following simple lemma.

Lemma 3.3.

Fix constant where . Let so that . Then, with probability , a sequence of uniformly random vectors from is linearly independent, and corresponds to a -distance linear code.

Proof:  The proof follows from a straightforward counting argument. If we draw uniformly random vectors , then each non-trivial linear combination of them is in itself a uniformly random vector from , and hence has weight less than with probability at most

where we set .

By a union bound over all possible combinations, the probability that there exists a linear combination with weight less than is at most . If this is not the case, then are linearly independent, and moreover, corresponds to a -distance linear code (where we use the fact that the distance of a linear code is equal to the minimal Hamming weight of a non-zero codeword).     

Our construction makes use of a sequence of vectors that correspond to a high-distance and high-dual distance code, as described below.

Definition 3.4 (Hard code ensemble ).

Let and let be a sequence of vectors in such that is a -distance code, and that is a -dual distance code. Let

We define the code ensemble as

where all operations are over .

The next lemma states that a collection of random vectors in satisfies the basic requirements of a code ensemble with high probability (that is, with probability tending to one as ), and hence such a code ensemble exists.

Lemma 3.5.

A set of random vectors in satisfies with high probability the following two conditions: is a -distance code, and is a -dual distance code. In particular, for all large enough the code ensemble exists.

Proof:  We apply Lemma 3.3 multiple times. First, picking , , , and , we conclude that with high probability correspond to a -distance code.

To show that with high probability the code spanned by the last vectors has high dual distance, we compare the following two processes, whose output is a linear subspace of , that we view as a code: (i) Choose vectors and return their span. (ii) Choose vectors and return the dual of their span. Conditioning on the chosen vectors being linearly independent, the output distributions of these two processes are identical. Indeed, by a symmetry argument it is not hard to see that under the conditioning, the linear subspace generated by Process (i) is uniformly distributed among all rank- subspaces of . Now, since we can uniquely couple each such with its dual (also a rank- subspace) and since , this means that the output distribution of Process (ii) is uniform as well.

However, it follows again from Lemma 3.3 (with , , , and any satisfying the conditions of the lemma) that the chosen vectors are independent with high probability. This means that (without the conditioning) the output distributions of Process (i) and Process (ii) are -close in variation distance. Applying Lemma 3.3 with , , , and we get that the distance of the code generated by Process (i) is at least with high probability. However, the latter distance equals by definition to the dual distance of the code generated by Process (ii). By the closeness of the distributions, we conclude that the dual distance of Process (i) is also at least with high probability.     

We next state a simple but important observation regarding membership verification.

Observation 3.6.

Once a matrix with the desired properties is constructed (which may take time if we use brute force), given , the membership of in can be verified in time (by solving a system of linear equations over ).

4 s and s

Next, we define the notion of Probabilistically Checkable Unveiling (). This notion is similar to , but here instead of requiring our input to satisfy a given property, we require our input to encode a value (typically using a large distance code ensemble). We then require that given the encoded value , it will be possible to prove in a -like fashion that the input is indeed a valid encoding of .

Definition 4.1 ().

Fix , and let be a code ensemble. We say that has a -query, length- if the following holds. There exists a verification algorithm that takes as inputs , , and , makes at most queries to the strings and , and satisfies the following:

  1. If , then there exists a proof such that for every , the verifier accepts with probability .

  2. If , then for every alleged proof , the verifier rejects with probability greater than .

In order to facilitate the proof of the main theorem, we utilize a more stringent variant of the above definition. Instead of supplying to the algorithm, we supply oracle access to a a string that is supposed to represent , along with the proof , and the algorithm only makes queries to the proof string , the original encoding and the string . For cases where , we use to denote .

Definition 4.2 (-).

Fix , and let be a code ensemble. We say that has a -query, length- - if the following holds. There exists a verification algorithm that takes as inputs , , makes at most queries to the strings , and , and satisfies the following:

  1. If there exists for which and , then there exists a proof such that for every , the verifier accepts with probability .

  2. If for every either or , then for every alleged proof , the verifier rejects with probability greater than .

Note that a code ensemble admitting a - automatically admits a . Indeed, given the string , an oracle for can be simulated.

The following lemma states the existence of - for efficiently computable code ensembles, and will be used throughout this work. The proof follows from Lemma 2.9 together with a simple concatenation argument.

Lemma 4.3.

Let be such that , and let be a code ensemble. If given and , it is possible to verify membership of in using a circuit of size , then there is a -query, length- - for where .

Proof:  Assume without the loss of generality that . Let (note that ), and define

where denotes the -times concatenation of .

For any string it is possible to check, using a quasilinear size circuit (see [Spi96]), that the substring that corresponds to the domain of is a -times repetition of for some . After doing so, we decode using a quasilinear size circuit (as in [Spi96]), and then, by the premise of the lemma, we can verify membership in using a circuit of size . Therefore, membership in can be decided using a size boolean circuit, and therefore by Lemma 2.9 admits a system whose proof length is quasilinear in .

Given an input to -, let and use the system for , with detection radius and soundness , where each query to is emulated by a corresponding query to or . Note that if , then , so the system for will accept with probability .

Next, suppose that , and observe that this implies that is at least -far from . Thus, by the soundness property of the for , the verifier rejects with probability at least , regardless of the contents of the alleged proof it is supplied with.     

Next we define Probabilistically Checkable Unveiling of a Shared Secret ().

Definition 4.4.

For , we say that a function has a -query, length- , if has a shared secret property, as well as has a -query, length- . Similarly, when has a shared secret property (for constant ), as well as has a -query, length- -, we say that has a -query, length- -.

Note that admitting a - directly implies that it admits a with similar parameters.

The following lemma establishes the existence of a - for , where is the code ensemble from Definition 3.4.

Lemma 4.5.

For any , has a -query, length- - where .

Proof:  By Observation 3.6, given , membership in can be checked in time, which means that there exists a polynomial size circuit that decides membership in . Combining the above with Lemma 4.3 implies a -query, length- - where . By Lemma 2.7, the large dual distance property of implies its shared secret property for some constant , which concludes the proof of the lemma.     

5 construction

In this section we give a construction of code ensembles that admit a . First we show that our code ensemble has a with a short proof. Specifically,

Lemma 5.1.

For any fixed and any , there exists and a code ensemble , such that for all , the code ensemble has a -query length- , for .

Later, we prove that our code ensemble has a shared secret property, which implies that it has a (which implies Theorem 1.1, as we shall show).

Theorem 5.2.

For any fixed and any , there exists and a code ensemble , such that for all , the code ensemble has a -query length- , for .

5.1 The iterated construction

Our iterative construction uses polynomials over a binary finite field . In our proof we will need to be able to implement arithmetic operations over this field efficiently (i.e., in time). This can be easily done given a suitable representation of the field: namely, a degree irreducible polynomial over . It is unclear in general whether such a polynomial can be found in time. Fortunately though, for where , it is known that the polynomial is irreducible over (see, e.g., [Gol08, Appendix G]). We will therefore restrict our attention to fields of this form. At first glance this seems to give us a property that is defined only on a sparse set of input lengths. However, towards the end of this section, we briefly describe how to bypass this restriction.

We next formally define our iterated construction, starting with the “level-0” construction as a base case. The constants in the definition will be explicitly given in the proof of Lemma 5.8. Additionally, for any , we shall pick a large enough constant that satisfies several requirements for the “level-” iteration of the construction.

Definition 5.3 (Iterated coding ensemble).

For and , we define the base code ensemble of (i.e., level- code ensemble of for ) as

Let be large enough global constants, fix , let be large enough, and let be a finite field for which .

We define the level- code ensemble of over as follows. Let be the smallest integer such that , set and . Note that these satisfy the recursive requirements of a level-