Locally decodable codes (LDCs) are error-correcting codes that admit a decoding algorithm that recovers each specific symbol of the message by reading a small number of locations in a possibly corrupted codeword. More precisely, a locally decodable code with local decoding radius is an error-correcting code that admits a local decoding algorithm , such that given an index and a corrupted word which is -close to an encoding of some message , reads a small number of symbols from , and outputs
with high probability. Similarly, we have the notion oflocally correctable codes (LCCs), which are error-correcting codes that not only admit a local algorithm that decode each symbol of the message, but are also required to correct an arbitrary symbol from the entire codeword. Locally decodable and locally correctable codes have many applications in different areas of theoretical computer science, such as complexity theory, coding theory, property testing, cryptography, and construction of probabilistically checkable proof systems. For details, see the surveys [Yek12, KS17] and the references within.
Despite the importance of LDCs and LCCs, and the extensive amount of research studying these objects, the best known construction of constant query LDCs has super-polynomial length , which is achieved by the highly non-trivial constructions of [Yek08] and [Efr12]. For constant query LCCs, the best known constructions are of exponential length, which can be achieved by some parameterization of Reed-Muller codes. It is important to note that there is huge gap between the best known lower bounds for the length of constant query LDCs and the length of best known constructions. Currently, the best known lower bound on the length of LDCs says that for it must be at least , where stands for the query complexity of the local decoder. See [KT00, Woo07] for the best general lower bounds for constant query LDCs.
Motivated by applications to probabilistically checkable proofs (PCPs), Ben-Sasson, Goldreich, Harsha, Sudan, and Vadhan introduced in [BGH06] the notion of relaxed locally decodable codes (RLDCs). Informally speaking, a relaxed locally decodable code is an error-correcting code which allows the local decoding algorithm to abort if the input codeword is corrupt, but does not allow it to err with high probability. In particular, the decoding algorithm should always output correct symbol, if the given word is not corrupted. Formally, a code is an RLDC with decoding radius if it admits a relaxed local decoding algorithm which given an index and a possibly corrupted codeword , makes a small number of queries to , and satisfies the following properties.
If for some , then should output .
- Relaxed decoding:
If is -close to some codeword , then should output either or a special abort symbol with probability at least 2/3.
This relaxation turns out to be very helpful in terms of constructing RLDCs with better block length. Indeed, [BGH06] constructed of a -query RLDC with block length .
The notion of relaxed LCCs (RLCCs), recently introduced in [GRR18], naturally extends the notion of RLDCs. These are error-correcting codes that admit a correcting algorithm that is required to correct every symbol of the codeword, but is allowed to abort if noticing that the given word is corrupt. More formally, the local correcting algorithm gets an index , and a (possibly corrupted) word , makes a small number of queries to , and satisfies the following properties.
If , then should output .
- Relaxed correcting:
If is -close to some codeword , then should output either or a special abort symbol with probability at least 2/3.
Note that if the code is systematic, i.e., the encoding of any message contains in its first symbols, then the notion of RLCC is stronger than RLDC.
Recently, building on the ideas from [GRR18], [CGS20] constructed RLCCs whose block length matches the RLDC construction of [BGH06]. For the lower bounds, the only result we are aware of is the work of Gur and Lachish [GL20], who proved that for any RLDC the block length must be at least .
Given the gap between the best constructions and the known lower bounds, it is natural to ask the following question:
In particular, [BGH06] asked whether it is possible to obtain a -query RLDC whose block length is strictly smaller than the best known lower bound on the length of LDCs. A positive answer to their question would show a separation between the two notions, thus proving that the relaxation is strict. See paragraph Open Problem in the end of Section 4.2 of [BGH06].
In this work we make progress on this problem by constructing a relaxed locally decodable code with query complexity and block length . In fact, our construction gives the stronger notion of a relaxed locally correctable code.
Theorem 1 (Main Theorem).
For every there exists an -query relaxed locally correctable code with constant relative distance and constant decoding radius, such that the block length of is
Therefore, our construction improves the parameters of the -query RLDC construction of [BGH06] with block length , and matches (up to a multiplicative factor in ) the lower bound of for the block length of -query LDCs [KT00, Woo07].
Using the techniques from [CGS20] it is not difficult to obtain an RLCC over the binary alphabet with almost the same block length. Indeed, this can be done by concatenating our code over large alphabet with an arbitrary binary code with constant rate and constant relative distance. See Section 7 for details.
1.1 Related works
RLDC and RLCC constructions: Relaxed locally decodable codes, were first introduced by [BGH06], motivated by applications to constructing short PCPs. Their construction has a block length equal to . Since that work, there were no constructions with better block length, in the constant query complexity regime . Recently, [GRR18] introduced the related notion of relaxed locally correctable codes (RLCCs), and constructed -query RLCCs with block length . Then, [CGS20] constructed relaxed locally correctable codes with block length matching that of [BGH06] (up to a multiplicative constant factor ). The construction of [CGS20] had two main components, that we also use in the current work.
- Consistency test using random walk ():
Informally, given a word , and a coordinate we wish to correct, samples a sequence of constraints on , such that the domains of and intersect, with the guarantee that if is close to some codeword , but , then with high probability will be far from satisfying at least one of the constraints. In other words, performs a random walk on the constraints graph and checks if is consistent with in the ’th coordinate. We introduce this notion in detail in Section 2.1, and prove that the Reed-Muller code admits a in Section 4.
- Correctable canonical PCPPs (ccPCPP):
These are PCPP systems for some specified language satisfying the following properties: (i) for each there is a unique proof that satisfies the verifier with probability 1, (ii) the verifier accepts with high probability only pairs that are close to some for some , i.e., only the pairs where is close to some , and is close to , and (iii) the set is an RLCC. Canonical proofs of proximity have been studies in [DGG18, Par20]. We elaborate on these constructions in Section 5.
Lower bounds: For lower bounds, the only bound we are aware of is that of [GL20], who proved that any -query relaxed locally decodable code must have a block length .
For the strict notion of locally decodable codes, it is known by [KT00, Woo07] that for any -query LDC must have block length . For a slightly stronger bound of is known, and furthermore, for -query linear LDC the block length must be [Woo07]. For [KdW03] proved an exponential lower bound of . See also [DJK02, GKST02, Oba02, WdW05, Woo10] for more related work on lower bounds for LDCs.
The rest of the paper is organized as follows. In Section 2, we informally discuss the construction and the correcting algorithm. In this discussion we focus on decoding the symbols corresponding to the message, i.e., on showing that the code is an RLDC. Section 3 introduces the formal definitions and notations we will use in the proof of creftype 1. We present the notion of consistency test using random walk in Section 4, and prove that the Reed-Muller code admits such test. In Section 5 we present the PCPPs we will use in our construction, and state the properties needed for the correcting algorithm. In Section 6 we prove creftype 1 by proving a composition theorem, which combines the instantiation of the Reed-Muller code with PCPPs from the previous sections.
2 Proof overview
In this section we informally describe our code construction. Roughly speaking, our construction consists of two parts:
- The Reed-Muller encoding:
Given a message , its Reed-Muller encoding is the evaluation of an -variate polynomial of degree at most over , whose coefficients are determined by the message we wish to encode.
- Proofs of proximity:
The second part of the encoding consists of the concatenation of PCPPs, each claiming that a certain restriction of the first part agrees with some Reed-Muller codeword.
Specifically, given a message , we first encode it using the Reed-Muller encoding , where roughly corresponds to the query complexity of our RLDC, and the field is large enough so that the distance of the Reed-Muller code, which is equal to , is some constant, say . That is, the first part of the encoding corresponds to an evaluation of some polynomial of degree at most . The second part of the encoding consists of a sequence of PCPPs claiming that the restrictions of a the Reed-Muller part to some carefully chosen planes in are evaluations of some low-degree polynomial.
The planes we choose are of the form , where , and for some subfield of . We will call such planes -planes. In order to obtain the RLDC with the desired parameters, we choose the field so that is the extension of of degree . It will be convenient to think of as a field and think of
as a vector space ofof dimension (augmented with the multiplicative structure on ). Indeed, the saving in the block length of the RLDC we obtain crucially relies on the fact that we ask for PCPPs for only a small collection of planes, and not all planes in . The actual constraints required to be certified by the PCPPs are slightly more complicated, and we describe the next.
The constraints of the first type correspond to -planes and points . For each such pair the code will contain a PCPP certifying that (i) the restriction of the Reed-Muller part to is close to an evaluation of some polynomial of total degree at most , (ii) and furthermore, this polynomial agrees with the value of the Reed-Muller part on . In order to define it formally, we introduce the following notation.
Let be a finite field of size . Fix , a plane in , and a point . Denote . That is, the length of is , and it consists of concatenated with repetitions of .
Given the notation above, if is the first part of the codeword, corresponding to the Reed-Muller encoding of the message, then the PCPP for the pair is expected to be the proof of proximity claiming that is close to the language
Note that by repeating the symbol for times, the definition indeed puts weight 1/2 on the constraint that the input is close to some low-degree polynomial , and puts weight 1/2 of the constraint . In particular, if is -close to some bivariate low degree polynomial for some small , but , then is at least -far from any bivariate low degree polynomial on .
The constraints of second type correspond to -planes and lines . For each such pair the code will contain a PCPP certifying that (i) the restriction of the Reed-Muller part to is close to an evaluation of some polynomial of total degree at most , (ii) and furthermore, this polynomial is close to . (In particular, this implies that is close to some low-degree polynomial.)
Next, we introduce the notation analogous to creftype 2.1 replacing the points with lines.
Let be a finite field of size . Fix , a plane in , and a line . Denote by . That is, the length of is , and it consists of concatenated with repetitions of .
If is the Reed-Muller part of the codeword, corresponding to the Reed-Muller encoding of the message, then the PCPP for the pair is expected to be the proof of proximity claiming that is close to the language
Again, similarly to the first part, repeating the evaluation of for times puts weight 1/2 on the constraint that the input is a close to some low-degree polynomial , and puts weight 1/2 of the constraint is close to .
With the proofs specified above, we now sketch the local correcting algorithm for the code. Below we only focus on correcting symbols from the Reed-Muller part. Correcting the symbols from the PCPP part follows a rather straightforward adaptation of the techniques from [CGS20], and we omit them from the overview.
Given a word and an index of corresponding to the Reed-Muller part of the codeword, let be the Reed-Muller part of , and let be the input to corresponding to the index . The local decoder works in two steps.
- Consistency test using random walk:
In the first step the correcting algorithm invokes a procedure we call consistency test using a random walk () for the Reed-Muller code. This step creates a sequence of -planes of length , where each plane defines a constraint checking that the restriction of to the plane is low-degree. Hence, we get constraints, each depending on symbols.
- Composition using proofs of proximity:
Then, instead of reading the entire plane for each constraint, we use the PCPPs from the second part of the codeword to reduce the arity of each constraint to , thus reducing the total query complexity of the correcting algorithm to . That is, for each constraint we invoke the corresponding PCPP verifier to check that the restrictions of to each of these planes is (close to) a low-degree polynomial. If at least one of the verifiers rejects, then the word must be corrupt, and hence the correcting algorithm returns . Otherwise, if all the PCPP verifiers accept, the correcting algorithm returns .
In particular, if is a correct Reed-Muller encoding, then the algorithm will always return , and the main part of the analysis is to show that if is close to some , but , then the correcting algorithm catches an inconsistency, and returns with some constant probability. See Section 6.3 for details.
The key step in the analysis says that if is close to some codeword but , then with high probability will be far from a low degree polynomial on at least one of these planes, where “far” corresponds to the notion of distances defined by the languages and . In particular, if on one of the planes is far from the corresponding language, then the PCPP verifier will catch this with constant probability, thus causing the correcting algorithm to return . We discuss this part in detail below.
It is important to emphasize that the main focus of this work is constructing a correcting algorithm for the Reed-Muller part. Using the techniques developed in [CGS20], it is rather straightforward to design the algorithm for correcting symbols from the PCPPs part of the code. See Section 6.4 for details.
2.1 on Reed-Muller codes
Below we define the notion of consistency test using random walk () for the Reed-Muller code. This notion is a slight modification of the notion originally defined in [CGS20] for general codes. In this paper we define it only for the Reed-Muller code. Given a word and some , the goal of the test is to make sure that is consistent with the codeword of Reed-Muller code closest to . [CGS20] describe a
for the tensor powerof an arbitrary codes with good distance (e.g., Reed-Solomon). The they describe works by starting from the point we wish to correct, and choosing an axis-parallel line containing the starting point. The test continues by choosing a sequence of random axis-parallel lines , such that each intersects the previous one, , until reaching a uniformly random coordinate of the tensor code. That is, the length of the sequence denotes the mixing time of the corresponding random walk. The predicates are defined in the natural way; namely, the test expects to see a codeword of on each line it reads.
In this work, we present a for the Reed-Muller code, which is a variant of the described above. The main differences compared to the description above are that (i) the test chooses a sequence of planes (and not lines), (ii) and every two planes intersect on a line (and not on a point). Roughly speaking, the algorithm works as follows.
Given a point the test picks a uniformly random -plane containing .
Given , the test chooses a random line , and then chooses another random -plane containing .
Given , the test chooses a random line , and then chooses another random -plane containing .
The algorithm continues for some predefined number of iterations, choosing
. Roughly speaking, the number of iterations is equal to the mixing time of the corresponding Markov chain. More specifically, the process continues until a uniformly random point inis close to a uniform point in .
The constraints defined for each are the natural constraints; namely checking that the restriction of to is a polynomial of degree at most .
One of the important parameters, directly affecting the query complexity of our construction is the mixing time of the random walk. Indeed, as explained above, the query complexity of our RLDC is proportional to the mixing time of the random walk. We prove that if , then the mixing time is upper bounded by . In order to prove this we use the following claim, saying that if is the field extension of of degree , and and are sampled uniformly, independently from each other, then is close to a uniformly random point in . See creftype 3.5 for the exact statement.
As explained above, the key step of the analysis is to prove that if is close to some codeword but , then with high probability at least one of the predicates defined will be violated. Specifically, we prove that with high probability the violation will be in the following strong sense.
Theorem 2.3 (informal, see Theorem 4.3).
If is close to some codeword but , then with high probability
either is -far from ,
or is -far from for some .
Indeed, this strong notion of violation allows us to use the proofs of proximity in order to reduce the query complexity to queries for each . We discuss proofs of proximity next.
2.2 PCPs of proximity and composition
The second building block we use in this work is the notion of probabilistic checkable proofs of proximity (PCPPs). PCPPs were first introduced in [BGH06] and [DR04]. Informally speaking, a PCPP verifier for a language , gets an oracle access to an input and a proof claiming that is close to some element of . The verifier queries and in some small number of (random) locations, and decides whether to accept or reject. The completeness and soundness properties of a PCPP are as follows.
If , then there exists a proof causing the verifier to accept with probability 1.
If is far from , then no proof can make the verifier to accept with probability more than 1/2.
In fact, we will use the slightly stronger notion of canonical PCPP (cPCPP) systems. These are PCPP systems satisfying the following completeness and soundness properties. For completeness, we demand that for each in the language there is a unique canonical proof that causes the verifier to accept with probability 1. For soundness, the demand is that the only pairs that are accepted by the verifier with high probability are those where is close to some and is close to . Such proof system have been studies in [DGG18, Par20], who proved that such proof systems exist for every language in .
Furthermore, for our purposes we will demand a stronger notion of correctable canonical PCPP systems (ccPCPP). These are canonical PCPP systems where the set is a -query RLCC for some parameter , with denoting the canonical proof for . It was shown in [CGS20] how to construct ccPCPP by combining a cPCPP system with any systematic RLCC. Informally speaking, for every , and its canonical proof , we define by encoding using a systematic RLCC. The verifier for the new proof system is defined in a straightforward manner. See [CGS20] for details.
The PCPPs we use throughout this work, are the proofs of two types, certifying that
is close to for some plane and some , and
is close to for some plane and some line .
Indeed, it is easy to see that the first type of proofs checks that (i) the restriction of to is close to an evaluation of some polynomial of total degree at most , (ii) and . Similarly, the second type proof certifies that (i) the restriction of to is close to an evaluation of some polynomial of total degree at most , (ii) and is close to .
These notions of distance go together well with the guarantees we have for in Theorem 2.3. This allows us to compose with the PCPPs to obtain a correcting algorithm with query complexity . Informally speaking, the composition theorem works as follows. We first run the to obtain a collection of constraints on the planes . By Theorem 2.3, we have the guarantee that with high probability either is -far from , or is -far from for some . Then, instead of actually reading the values of on all these planes, we run the PCPP verifier on to check that it is close to , and running the PCPP verifier on each of the to check that they are close to . Each execution of the PCPP verifier makes queries to and to the proof, and thus the total query complexity will be indeed . As for soundness, if is -far from , or is -far from for some , then the corresponding verifier will notice an inconsistency with constant probability, causing the decoder to output .
We begin with standard notation. The relative distance between two strings is defined as
If , we say that is -close to ; otherwise we say that is -far from . For a non-empty set define the distance of from as . If , we say that is -close to ; otherwise we say that is -far from .
We will also need a more general notion of a distance, allowing different coordinates to have different weight. In particular, we will need the distance that gives constant weight to a particular subset of the coordinates, and spreads the rest of the weight uniformly between all coordinates.
Fix and an alphabet . For a set define the distance between two strings as
In particular, if differs from on coordinates in , then is at least .
We define between a string and a set as
This definition generalizes the definition of [CGS20] of for a coordinate . Indeed, the notion of for a coordinate corresponds to the singleton set .
When the set is a singleton we will write to denote , and we will write to denote .
3.1 Basic coding theory
Let be positive integers, and let be an alphabet. An error correcting code is an injective mapping from messages of length over the alphabet to codewords of length . The parameter is called the message length of the code, and is its block length (which we view as a function of ). The rate of the code is defined as , and the relative distance of the code is defined as . We sometimes abuse the notation and use to denote the set of all of its codewords, i.e., identify the code with .
Linear codes Let be a finite field. A code is linear if it is an -linear map from to . In this case the set of codewords is a subspace of , and the message length of is also the dimension of the subspace. It is a standard fact that for any linear code , the relative distance of is equal to .
3.2 Reed-Muller codes
Reed-Muller codes [Mul54] are among the most well studied error correcting codes, with many theoretical and practical applications in different areas of computer science and information theory. Let be a finite field of order , and let and be integers. The code is the linear code whose codewords are the evaluations of polynomials of total degree at most over . We will allow ourselves to write , since the field is fixed throughout the paper. We will also sometimes omit the parameters and , and simply write , when the parameters are clear from the context.
In this paper we consider the setting of parameters where . It is well known that for the relative distance of is . The dimension of can be computed by counting the number of monomials of total degree at most . For the number of such monomials is . Since the length of each codeword is , it follows that the rate of the code is .
For denote by the line
Also, for denote by the plane
An important property of (and multivariate low-degree polynomials, in general) that we use throughout this work is that their restrictions to lines and planes in are also polynomials of degree at most . In other words, if , and is a line ( is a plane) in , then the restriction of to (or to ) is a codeword of the Reed-Muller code of the same degree and lower dimension.
The following lemma is a standard lemma in the PCP literature, saying that random lines sample well the space .
Let be a finite field. For any subset of density , and for any it holds that
For each , let
be an indicator random variable for the event. Since each point is a uniform point in the plane, we have , Therefore, denoting , it follows that .
We are interested in bounding the deviation of
from its expectation. We do it by bounding the variance of. Note first that . By the pairwise independence of the points on a line, it follows that . Therefore, by applying Chebyshev’s inequality we get
as required. ∎
The following claim will be an important step in our analysis.
Let be a parameter, let be a finite field, and let be its extension of degree . Let and be chosen independently uniformly at random from their domains.
Then for any set of size it holds that
In order to prove the claim let us introduce some notation. We write each element in as an -dimensional row vector over . Also, we will represent an element as a matrix over , where the ’th row represents , the ’th coordinate of
. Using this notation we need to prove that the random matrix corresponding to the sumis close to a random matrix with entries chosen uniformly from independently from each other.
Using the notation above, write each as a row vector . Observe that for any vector we can represent as the outer product
Therefore, the sum is represented as
where is the matrix with , and is the matrix with . That is, the sum is represented as a product of two uniformly random matrices over .
Next we show that if are chosen uniformly at random and independently, then for any collection of matrices of size it holds that . Indeed,
If is invertible, then for a uniformly random the probability that is exactly , and it is easy to check that . ∎
3.3 Relaxed locally correctable codes
Following the discussion in the introduction, we provide a formal definition of relaxed LCCs, and state some related basic facts and known results.
Definition 3.6 (Relaxed LCC).
Let be an error correcting code with relative distance , and let , ,and be parameters. Let be a randomized algorithm that gets an oracle access to an input and an explicit access to an index . We say that is a -query relaxed local correction algorithm for with correction radius and soundness if for all inputs the algorithm reads explicitly the coordinate , reads at most (random) coordinates in , and satisfies the following conditions.
For every , and every coordinate it holds that .
For every that is -close to some codeword and every coordinate it holds that , where is a special abort symbol.
The code is said to be a -relaxed locally correctable code (RLCC) with query complexity if it admits a -query relaxed local correction algorithm with correction radius and soundness .
Note that for systematic codes it is clear from Definition 3.6 that RLCC is a stronger notion than RLDC, as it allows the local correction algorithm not only to decode each symbol of the message, but also each symbol of the codeword itself. That is, any systematic RLCC is also an RLDC with the same parameters.
Finally, we recall the following theorem of Chiesa, Gur, and Shinkar [CGS20].
Theorem 3.8 ([Cgs20]).
For any finite field , and parameters , there exists an explicit construction of a systematic linear code with block length and constant relative distance, that is a -query RLCC with constant correction radius , and constant soundness .
3.4 Canonical PCPs of proximity
Next we define the notions of probabilistically checkable proofs of proximity, and the variants that we will need in this paper.
Definition 3.9 (PCP of proximity).
A -query PCP of proximity (PCPP) verifier for a language with soundness with respect the to proximity parameter , is a polynomial-time randomized algorithm that receives oracle access to an input and a proof . The verifier makes at most queries to and has the following properties:
For every there exists a proof such that .
If is -far from , then for every proof it holds that .
A canonical PCPP (cPCPP) is a PCPP in which every instance in the language has a canonical accepting proof. Formally, a canonical PCPP is defined as follows.
Definition 3.10 (Canonical PCPP).
A -query canonical PCPP verifier for a language with soundness with respect to proximity parameter , is a polynomial-time randomized algorithm that gets oracle access to an input and a proof . The verifier makes at most queries to , and satisfies the following conditions:
- Canonical completeness:
For every there exists a unique (canonical) proof for which .
- Canonical soundness:
For every and proof such that
it holds that .
Let be a proximity parameter. For every language in there exists a polynomial and a canonical PCPP verifier for satisfying the following properties.
For all of length the length of the canonical proof is .
The query complexity of the PCPP verifier is .
The PCPP verifier for has perfect completeness and soundness for proximity parameter (with respect to the uniform distance measure).
Next, we define the stronger notion of correctable canonical PCPPs (ccPCPP), originally defined in [CGS20]. A ccPCPP system is a canonical PCPP system that in addition to allowing the verifier to be able to locally verify the validity of the given proof, it also admits a local correction algorithm that locally corrects potentially corrupted symbols of the canonical proof. Formally, the ccPCPP is defined as follows.
Definition 3.12 (Correctable canonical PCPP).
A language is said to admit a ccPCPP with query complexity and soundness with respect to the proximity parameter , and correcting soundness for correcting radius if it satisfies the following conditions.
4 Consistency test using random walk on the Reed-Muller code
Below we define the notion of consistency test using random walk (). This notion has been originally defined in [CGS20] for tensor powers of general codes. In this paper we focus on for the Reed-Muller code.
Informally speaking, a consistency test using random walk for Reed-Muller code is a randomized algorithm that gets a word , which is close to some codeword , and an index as an input, and its goal is to check whether . In other words, it checks whether the value of at is consistent with the close codeword . Below we formally describe the random process.
Definition 4.1 (Consistency test using -plane-line random walk on ).
Let be a field, and let be a field extension of . Let be the Reed-Muller code. An -steps consistency test using -plane-line random walk on is a randomized algorithm that gets as input the evaluation table of some and a coordinate , and works as in Algorithm 1.