Improved decoding of Folded Reed-Solomon and Multiplicity Codes

In this work, we show new and improved error-correcting properties of folded Reed-Solomon codes and multiplicity codes. Both of these families of codes are based on polynomials over finite fields, and both have been the sources of recent advances in coding theory. Folded Reed-Solomon codes were the first explicit constructions of codes known to achieve list-decoding capacity; multivariate multiplicity codes were the first constructions of high-rate locally correctable codes; and univariate multiplicity codes are also known to achieve list-decoding capacity. However, previous analyses of the error-correction properties of these codes did not yield optimal results. In particular, in the list-decoding setting, the guarantees on the list-sizes were polynomial in the block length, rather than constant; and for multivariate multiplicity codes, local list-decoding algorithms could not go beyond the Johnson bound. In this paper, we show that Folded Reed-Solomon codes and multiplicity codes are in fact better than previously known in the context of list-decoding and local list-decoding. More precisely, we first show that Folded RS codes achieve list-decoding capacity with constant list sizes, independent of the block length; and that high-rate univariate multiplicity codes can also be list-recovered with constant list sizes. Using our result on univariate multiplicity codes, we show that multivariate multiplicity codes are high-rate, locally list-recoverable codes. Finally, we show how to combine the above results with standard tools to obtain capacity achieving locally list decodable codes with query complexity significantly lower than was known before.



There are no comments yet.


page 1

page 2

page 3

page 4


Decoding Multivariate Multiplicity Codes on Product Sets

The multiplicity Schwartz-Zippel lemma bounds the total multiplicity of ...

Beyond the Guruswami-Sudan (and Parvaresh-Vardy) Radii: Folded Reed-Solomon, Multiplicity and Derivative Codes

The classical family of Reed-Solomon codes consist of evaluations of pol...

Locally Decodable/Correctable Codes for Insertions and Deletions

Recent efforts in coding theory have focused on building codes for inser...

Ideal-theoretic Explanation of Capacity-achieving Decoding

In this work, we present an abstract framework for some algebraic error-...

Efficient List-Decoding with Constant Alphabet and List Sizes

We present an explicit and efficient algebraic construction of capacity-...

Fast Decoding of AG Codes

We present an efficient list decoding algorithm in the style of Guruswam...

Optimal rate list decoding over bounded alphabets using algebraic-geometric codes

We give new constructions of two classes of algebraic code families whic...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

An error correcting code is a collection of codewords of length over an alphabet . The goal in designing is to enable the recovery of a codeword given a corrupted version of , while at the same time making as large as possible. In the classical unique decoding problem, the goal is to efficiently recover from any so that and differ in at most places; this requires that the relative distance of the code (that is, the fraction of places on which any two codewords differ) to be at least .

Modern applications of error correcting codes, both in coding theory and theoretical computer science, have highlighted the importance of variants of the unique decoding problem, incuding list decoding, and local decoding. In list-decoding, the amount of error is large enough that unique recovery of the codeword is impossible (that is, ), and instead the goal is to return a short list with the guarantee that . In local decoding, we still have , but the goal is to recover a single symbol of a codeword , after querying not too many positions of the corrupted codeword . In a variant known as local list-decoding, we seek local information about a symbol even when . List-decoding, local decoding, and local list-decoding are important primitives in error correcting codes, with applications in coding theory, complexity theory, pseudorandomness and cryptography.

Algebraic codes have been at the heart of the study of list-decoding, local-decoding and local list-decoding. One classical example of this is Reed-Solomon (RS) codes, whose codewords are comprised of evaluations of low-degree polynomials.111That is, a codeword of an RS code has the form for some low-degree polynomial . In the late 1990’s, Guruswami and Sudan [Sud97, GS99] gave an algorithm for efficiently list-decoding Reed-Solomon codes well beyond half the distance of the code, and this kicked off the field of algorithmic list-decoding. A second example is Reed-Muller (RM) codes, the multivariate analogue of Reed-Solomon codes. The structure of Reed-Muller codes is very amenable to local algorithms: a codeword of a Reed-Muller code corresponds to a multivariate low-degree polynomial, and considering the restriction of that polynomial to a line yields a univariate low-degree polynomial, a.k.a. a Reed-Solomon codeword. This local structure is the basis for Reed-Muller codes being locally testable [RS96] and locally decodable [Lip90, BFLS91]. Using this locality in concert with the Guruswami-Sudan algorithm leads to local list-decoding schemes [AS03, STV01] for these codes.

More recently, variants of Reed-Solomon and Reed-Muller codes have emerged to obtain improved list-decoding and local-decoding properties. Two notable examples, which are the focus of this work, are Folded Reed-Solomon (FRS) and multiplicity codes. Both of these constructions have led to recent advances in coding theory. We introduce these codes informally here, and give formal definitions in Section 2.

Folded Reed-Solomon codes, introduced by Guruswami and Rudra in [GR08], are a simple variant of Reed-Solomon codes. If the codeword of a Reed-Solomon code is , then the folded version (with folding parameter ) is

The main property of these codes that makes them interesting is that they admit much better list-decoding algorithms [GR08] than the original Guruswami-Sudan algorithm: more precisely, it allows for the error tolerance to be much larger for a code of the same rate,222The rate of a code is defined as and quantifies how much information can be sent using the code. We always have , and we would like to be as close to as possible. asymptotically obtaining the optimal trade-off.

Multiplicity codes, introduced in the univariate setting by Rosenbloom and Tsfasman in [RT97] and in the multivariate setting by Kopparty, Saraf and Yekhanin in [KSY14], are variants of polynomial codes that also include evaluations of derivatives. That is, while a symbol of a RS codeword is of the form for some low-degree polynomial and some , a symbol in a univariate multiplicity code codeword is of the form , where is the multiplicity parameter. Similarly, while a symbol of an RM codeword is of the form for for some low-degree multivariate polynomial , a symbol in a multivariate multiplicty code includes all partial derivatives of order less than . Multivariate multiplicity codes were shown in [KSY14] to have strong locality properties, and were the first constructions known of high-rate locally decodable codes. Meanwhile, univariate multiplicity codes were shown in [Kop15, GW13] to be list-decodable in the same parameter regime as folded Reed-Solomon codes333They were previously shown to be list-decodable up to the Johnson bound by Nielsen [Nie01]., also achieving asymptotically optimal trade-off between rate and error-tolerance.

In this work, we show that Folded Reed-Solomon codes, univariate multiplicity codes, and multivariate multiplicity codes are even more powerful than was previously known in the context of list-decoding and local list-decoding. Our motivations for this work are threefold:

  1. First, FRS codes and multiplicity codes are basic and natural algebraic codes, central to many recent results in coding theory ([GR08, KSY14, Kop15, GW13, DL12, KMRS17, GKO17], to name a few) and understanding their error-correcting properties is important in its own right.

  2. Second, by composing our new results with known techniques, we obtain capacity-achieving locally list-decodable codes with significantly improved query complexity than previously known.

  3. Third, while there have been improved constructions of list-decodable and locally list-decodable codes building on FRS and multiplicity codes (discussed more below), those constructions involve significant additional pseudorandom ingredients. Our results give simpler constructions of capacity achieving list-decodable and locally list-decodable codes with the best known parameters. In particular, we give the first constructions of linear444Many codes in this paper have alphabet , where

    is a finite field. For such “vector alphabet” codes, we use the term “linear” to mean “

    -linear”. capacity-achieving list-decodable codes with constant alphabet size and constant output list size.

We will state our results and contributions more precisely in Section 1.2 after setting up a bit more notation and surveying related work.

1.1 Related work

List-recoverable codes.

While the discussion above focused on the more well-known problem of list-decoding, in this work we actually focus on a generalization of list-decoding known as list-recovery. Given a code , an -list-recovery algorithm for takes as input a sequence of lists , each of size at most , and returns a list of all of the codewords so that for all but an fraction of the coordinates ; the combinatorial requirement is that . List-decoding is the special case of list-recovery when .

Both list-recovery and list-decoding have been important in coding theory, especially in theoretical computer science, for the past several decades (see [Sud97, Vad12] for overviews). Initially, the generalization to list recovery was used as a building block towards constructions of list decodable and uniquely decodable codes [GI02, GI03, GI04, GI05, KMRS17, GKO17, HRW17], although it has since found additional applications in algorithm design [INR10, NPR12, GNP13].

The Guruswami-Sudan algorithm, mentioned above, is in fact a list-recovery algorithm as well as a list-decoding algorithm, and can efficiently list-recover Reed-Solomon codes up to radius , with polynomial list sizes ; this trade-off is known as the Johnson bound. It is a classical result that there are codes that go beyond the Johnson bound while keeping the output list size polynomial in , or even constant: for large alphabet sizes, the “correct” limit (called the list-decoding or list-recovering capacity), is , provided is sufficiently larger than , and this is achieved by uniformly random codes. There is a big difference between and , especially when . In particular, the Guruswami-Sudan algorithm requires Reed-Solomon codes to have rate to be -list-recoverable for nontrivial , while a completely random code can achieve rates arbitrarily close to (of course, without efficient decoding algorithms). For a decade it was open whether or not one could construct explicit codes which efficiently achieve list-decoding capacity.

In a breakthrough result, Guruswami and Rudra [GR08] (building on the work of Parvaresh and Vardy [PV05]) showed that the folding operation described above can make RS codes approach capacity with polynomial list-sizes. For some time, this was the only known route to capacity-achieving codes, until it was shown in [GW13, Kop15] that univariate multiplicity codes also do the job (again, with polynomial list sizes). Since then there has been a great deal of work aimed at reducing the list size and alphabet size of these constructions, both of which were polynomial in (and both of which would ideally be independent of ). To reduce the alphabet size to constant, two high-level strategies are known to work: (1) swapping out the standard polynomial codes for Algebraic Geometry (AG) codes [GX12, GX13, GK16b], and (2) concatenation and distance amplification using expander graphs [AEL95, GI04, HW15, GKO17, HRW17]. To reduce the list-size to constant, the known strategies involve passing to carefully constructing subcodes of Folded Reed-Solomon codes and univariate multiplicity codes, via pseudorandom objects such as subspace evasive sets or subspace designs [DL12, GW13, GX12, GX13, GK16b].

In this work, we show that in fact both folded Reed-Solomon codes and univariate multiplicity codes are already list-recoverable with constant list-sizes, with no additional modification needed! The resulting codes still have large alphabet sizes, but this can be ameliorated by using the same expander-based techniques described above.

We summarize the state of affairs for list-recovery in Table 1, and discuss our contributions in more detail below in Section 1.2.

Code Alphabet size List size Explicit? Linear? Decoding time Notes
Completely random code No No -
Random linear code [RW17] No Yes -
Folded RS codes [GR08] Yes Yes
Univariate Multiplicity [Kop15] Yes Yes
Folded RS/Univariate Multiplicity [GW13] Yes Yes Output is a small subspace containing all nearby codewords.
Folded RS codes (This work, Theorem 3.1) Yes Yes
Univariate Multiplicity codes (This work, Theorem 4.1) Yes Yes For only.
Folded RS subcodes (via subspace evasive) [DL12] Yes No
Folded AG (via subspace evasive) [GX12] No No
Folded AG (via subspace designs) [GX13, GK16b] Yes Yes
Tensor products of AG subcodes, plus expander techniques [HRW17] Yes Yes
Folded RS codes, plus expander techniques (This work, Corollary 6.6) Yes Yes
Table 1: Constructions of -list-recoverable codes of rate , where is list-recovering capacity (when ). The top part of the table focuses on “simple” algebraic constructions; the bottom part has constructions which are involved. We assume that is constant (independent of ).
Code Alphabet size List size Locality Explicit?
Tensor products of AG Subcodes, plus expander techniques [HRW17] Yes
Multivariate Multiplicity codes, plus expander techniques (This work, Theorem 6.2) Yes
Multivariate Multiplicity codes, plus expander techniques (This work, Theorem 6.1) Yes
Table 2: Constructions of -locally-list-recoverable codes of rate , where is list-recovering capacity (when ). We assume that is constant (independent of ).
Locally list-recoverable codes.

As mentioned above, local decoding has been an important theme in coding theory for the past several decades. Locality makes sense in the context of list-recovery as well. The definition of local list-recovery (given formally below as Definition 2.3 below) is a bit involved, but intuitively the idea is as follows. As with list-recovery, we have input lists , so that each is of size at most . The goal is to obtain information about a single symbol of a codeword , given query access to . More precisely, we will require that the decoder output a short list of randomized algorithms , each of which corresponds to a codeword with . The requirement is that if corresponds to a codeword , then on input , outputs

with high probability, and using no more than

queries to . If such a decoder exists, we say that the code is -locally-list-recoverable. Local list-decoding is the case special case where .

This definition may seem a bit convoluted, but it turns out to be the “right” definition for a number of settings. For example, local list-decoding algorithms are at the heart of algorithms in cryptography [GL89], learning theory [KM93], and hardness amplification and derandomization [STV01]. Locally list-recoverable codes have been desirable as a step towards obtaining efficient capacity-achieving local list-decoding algorithms. In particular, high-rate locally list-recoverable codes, combined with standard techniques, yield capacity-achieving locally list-decodable and locally list-recoverable codes.

However, until recently, we did not know of any high-rate locally list-recoverable codes. The first such construction was given recently in [HRW17]. The approach of [HRW17] is as follows: it takes a folded AG subcode from [GX13, GK16b] (which uses subspace designs to find the subcode); applies tensor products many times; and concatenates the result with a locally correctable code. Finally, to obtain capacity-achieving locally list-decodable/recoverable, codes, that work applies an expander-based technique of [AEL95] to pseudorandomly scramble up the symbols of the codewords to amplify the amount of error tolerated.

The reason that so much machinery was used in [HRW17] is that despite a great deal of effort, the “natural” algebraic approaches did not seem to work. Perhaps the most natural algebraic approach is via Reed-Muller codes, which have a natural local structure. As discussed above, a Reed-Muller codeword corresponds to a low-degree multivariate polynomial, and restricting such a polynomial to a line yields a low-degree univariate polynomial, which corresponds to a Reed-Solomon codeword. Using this connection, along with the Guruswami-Sudan algorithm for Reed-Solomon codes, Arora and Sudan [AS03] and Sudan, Trevisan and Vadhan [STV01] gave algorithms for locally list-decoding Reed-Muller codes up the the Johnson bound555Technically these algorithms only came within a factor of the Johnson bound. To go all the way to the Johnson bound, one needs some additional ideas [BK09]; see [GK16a, Kop15] for further variations on this.. This algorithm also extends naturally to local list-recovery up to the Johnson bound [GKO17], but this means that for large values of one cannot obtain high-rate codes.

One might hope to use a similar approach for multivariate multiplicity codes; after all, the univariate versions are list-recoverable to capacity. However, the fact that the list sizes were large was an obstacle to this approach, and again previous work on the local list-decodability of multivariate multiplicity codes also only worked up to the Johnson bound [Kop15].

In this work, we return to this approach, and—using our results on univariate multiplicity codes—show that in fact high-rate multivariate multiplicity codes are locally list-recoverable. Using our construction, combined with some expander-based techniques, we obtain capacity-achieving locally list-recoverable codes which improve on the state-of-the-art. The quantitative results are stated in Table 2, and we discuss them in more detail in the next section.

1.2 Our contributions

The main contribution of this work improved results on the (local)-list-recoverability of FRS codes and multiplicity codes. We discuss a few of the concrete outcomes below.

  • Constant list sizes for folded Reed-Solomon codes. Theorem 3.1 says that a folded RS code of rate and alphabet size is -list-recoverable with . This improves over the previous best-known list size for this setting, which was . In particular, when are constant, the list size improves from polynomial in to a constant.

  • Constant list sizes for univariate multiplicity codes. Theorem 4.1 recovers the same quantitative results as Theorem 3.1 for univariate multiplicity codes with degree smaller than the characteristic of the underlying field.

    When the degree is larger than the characteristic, which is what is relevant for the application to multivariate multiplicity codes, we obtain a weaker result. We no longer have capacity-achieving codes, but we obtain high-rate list-recoverable codes with constant list sizes. More precisely, Theorem 4.4 implies that rate univariate multiplicity codes are efficiently -list-recoverable for and . In particular, Theorem 4.4 is nontrivial even for high-rate codes, while the Johnson bound only gives results for .

  • High-rate multivariate multiplicity codes are locally list-recoverable. One reason to study the list-recoverability of univariate multiplicity codes is because list-recovery algorithms for univariate multiplicity codes can be used in local list-recovery algorithms for multivariate multiplicity codes. Theorems 5.1 and 5.2 show that high-rate multivariate multiplicity codes are locally list-recoverable. More precisely, in Theorem 5.1, we show that for constant , a multivariate multiplicity code of length with rate : is efficiently -locally-list-recoverable for , with list size and query complexity that are sub-polynomial in the block length . In Theorem 5.2, we instantiate the same argument with slightly different parameters to show a similar result where and are constant, but the query complexity is of the form .

  • Capacity-achieving locally list-recoverable codes over constant-sized alphabets. Theorems 5.1 and 5.2 give high-rate locally-list-recoverable codes; however, these codes do not achieve capacity, and the alphabet sizes are quite large. Fortunately, following previous work, we can apply a series of by-now-standard expander-based techniques to obtain capacity-achieving locally list-recoverable codes over constant-sized alphabets. We do this in Theorems 6.1 and 6.2, respectively.

    The only previous construction of capacity-achieving locally list-recoverable codes (or even high-rate locally list-recoverable codes) is due to [HRW17], which achieved arbitrary polynomially small query complexity (and even subpolynomial query complexity ) with slightly superconstant list size.

    Our codes in Theorem 6.1 achieve subpolynomial query complexity and subpolynomial list size. This brings the query complexity for capacity achieving local list-decodability close to the best known query complexity for locally decodable codes [KMRS17], which is (for the same codes).

    Our codes in Theorem 6.2 have arbitrary polynomially small query complexity, and constant list-size. This improves upon the codes of [HRW17].

    The quantitative details are shown in Table 2.

  • Deterministic constructions of capacity-achieving list-recoverable codes with constant alphabet size and list size. Our result in Theorem 3.1 for Folded Reed-Solomon codes give capacity-achieving list-recoverable codes with constant list size, but with polynomial alphabet size. By running these through some standard techniques, we obtain in Corollary 6.6 efficient deterministic constructions of -linear, capacity-achieving, list-recoverable codes with constant alphabet size and list size, with a decoding algorithm that runs in time .

    Codes with these properties do not seem to have been written down anywhere in the literature. Prior to our work, the same standard techniques could have also been applied to the codes of [DL12] (which are nonlinear subcodes of Folded Reed-Solomon codes) to construct nonlinear codes with the same behavior.

1.3 Overview of techniques

In this subsection, we give an overview of the proofs of our main results.

1.3.1 List recovery of folded Reed-Solomon and univariate multiplicity codes with constant output list size

Let be either a folded Reed-Solomon code or a univariate multiplicity code with constant distance . Suppose that is the “folding parameter” or “multiplicity parameter,” respectively, so that . We begin with a warm-up by describing an algorithm for zero-error list-recovery; that is, when . Here we are given “received lists” , and we want to find the list of all codewords such that for each . The groundbreaking work of [GR08] showed that for constant and large but constant , has size at most , and can be found in time . We now show that is in fact of size at most , and can be found in time .

The starting point for our improved list-recovery algorithms for folded Reed-Solomon and univariate multiplicity codes is the linear-algebraic approach to list-recovering these codes that was taken in [GW13]. The main punchline of this approach is that the list is contained in an affine-subspace of dimension at most , and further that this subspace can be found in time (this immediately leads to the previously known bound on ). Armed with this insight, we now bring the received lists back into play. How many elements of the affine space can have for all ? We show that there cannot be too many such .

The proof is algorithmic: we will give a randomized algorithm , which when given the low dimensional affine space , outputs a list of elements of , such that such that for any , is included in the output of with high probability. This implies that .

The algorithm works as follows. For some parameter , we pick coordinates uniformly at random. Then the algorithm iterates over all the choices of . For each such , checks if there is a unique element of such that for all . If so, we output that unique element ; otherwise (i.e., either there are either zero or greater than one such ’s) we do nothing. Thus the algorithm outputs at most elements of .

It remains to show that for any , the algorithm outputs with high probability. Fix such a . By assumption, for every , . Thus there will be an iteration where the algorithm takes . In this iteration, there will be at least one (namely ) which has the desired property. Could there be more? If there was another with this property, then the nonzero vector would have the property that vanishes on all coordinates . It turns out that this can only happen with very low probability. Lemma 2 from [SY11] shows that that for any linear space with dimension and distance at least , for a large enough constant (), it is very unlikely that there exists a nonzero element of that vanishes at random coordinates . Thus with high probability, is the unique found in that iteration, and is thus included in the output of . This completes the description and analysis of the algorithm , and thus of our zero-error list-recovery algorithm.

One way to prove (a version of) Lemma 2 from [SY11] is as follows. First we note the following simple but important lemma:

Lemma 1.1.

Let . Let be an -subspace with . Suppose has minimum distance at least . Then:

where .

Lemma 1.1 says that for any subspace of good distance, fixing a coordinate to reduces the dimension a little in expectation. Iterating this, we see that fixing many coordinates is very likely to reduce the dimension down to zero, and this proves the result that we needed above.

With our warm-up complete, we turn to our main theorem on the list-recoverability of Folded Reed-Solomon codes (Theorem 3.1), which shows that the output list size is small even in the presence of an fraction of errors (for small ). Our approach generalizes the case described above. Let be the list of -close codewords. Again, the linear-algebraic list decoder of [GW13] can produce a low dimensional affine subspace such that . Next, we show that the very same algorithm described above (with a different setting of the parameter ) does the desired list-recovery with at least some small constant probability . This will imply that .

To see why this works, fix a codeword . First observe that if we pick uniformly at random, the probability that for all is at least . This is small, but not too small; thus, there is some chance that at least one (the correct one) is found by .

Following the previous analysis, we now have to bound the probability that for random , the space of codewords from that vanish on all of has dimension at least one. This is the probability that strictly greater than one is found by . This time we will need a stronger (and much more specialized) version of Lemma 1.1, which shows that for subspaces of the Folded Reed-Solomon code, fixing a random coordinate to reduces the dimension by a lot: much more than the that we got from Lemma 1.1. Such a lemma was proved in [GK16b], although in a different language, and for a very different purpose. This lemma roughly shows that the expected dimension of , for a random , is at most . Setting , with applications of this lemma, we get that the probability that the space of codewords from that vanish on all of has dimension at least one is at most . Note that this probability is tiny compared to , and thus the probability that the algorithm succeeds in finding is at least , as desired.

The description above was for folded RS codes, but same method works for univariate multiplicity codes whose degree is smaller than the characteristic of the field . We state this in Theorem 4.1. The proof follows the same outline, using a different but analogous lemma from [GK16b].

For application to local list-recovery of multivariate multiplicity codes, however, we need to deal with univariate multiplicity codes where the degree is larger than . In Theorem 4.4, we show how to accomplish this when the fraction of errors is very small. The algorithm and the outline of the analysis described above can again do the job for this setting, although the analysis is much more involved. The proof, which we give in Section 4, gives better quantitative bounds than the previous approach, and requires us to open up the relevant lemma from [GK16b]. At the end of the day, we are able to prove a reasonable version of this lemma for the case when , and this allows the analysis to go through.

1.3.2 Local list-recovery of multivariate multiplicity codes

We now describe the high-level view of our local list-recovery algorithms. Our algorithm for local list-recovery of multivariate multiplicity codes follows the general paradigm for local list-decoding of Reed-Muller codes by Arora and Sudan [AS03] and Sudan, Trevisan and Vadhan [STV01]. In addition to generalizing various aspects of the paradigm, we need to introduce some further ideas to account for the fact that we are in the high rate setting666These ideas can also be used to improve the analysis of the [AS03] and [STV01] local list-decoders for Reed-Muller codes. In particular, they can remove the restriction that the degree needs to be at most the size of the field for the local list-decoder to work..

Local list-decoding of Reed-Muller codes is the following problem: we are given a function which is promised to be close to the evaluation table of some low degree polynomial . At the high level, the local list-decoding algorithm of [STV01] for Reed-Muller codes has two phases: generating advice, and decoding with advice. To generate the advice, we pick a uniformly random and “guess” a value (this guessing can be done by going over all ). Our hope for this guess is that equals .

Once we have this advice, we see how to decode. We define an oracle machine , which takes as advice , has query access to , and given an input , tries to compute . The algorithm first considers the line passing through and the advice point , and list-decode the restriction of to this line to obtain a list of univariate polynomials. These univariate polynomials are candidates for . Which of these univariate polynomials is ? We use our guess (which is suppose to be ): if there is a unique univariate polynomial in the list with value at , then we deem that to be our candidate for , and output its value at the point as our guess for . This algorithm will be correct on the point if (1) there are not too many errors on the line through and , and (2) no other polynomnial in takes the same value at as does. The first event is low probability by standard sampling bounds, and the second is low probability using the random choice of and the fact that is small. This algorithm does not succeed on all , but one can show that for random and , this algorithm does succeed on most . Then we can run a standard local correction algorithm for Reed-Muller codes to then convert it to an algorithm that succeeds on all with high probability.

We are trying to locally list-recover a multivariate multiplicity code; the codewords are of the form , where is a tuple that consists of all partial derivatives of of order less than , evaluated at . We are given query access to a function , where is the received list for the coordinate indexed by . Suppose for the following discussion that is a low-degree multivariate polynomial so that . We want to describe an algorithm that, with high probability will output a randomized algorithm that will approximate .

There are two main components to the algorithm again: generating the advice, and decoding with advice. The advice is again a uniformly random point , and a guess which is supposed to equal , a very high order evaluation of at , for some . We discuss how to generate later, let us first see how to use this advice to decode.

To decode using the advice , we give an oracle machine which takes advice and has query access to . If , then will be equal to with high probability over and . This algorithm is discussed in Section 5.3. Briefly, the idea is to consider the line through and and again run the univariate list-recovery algorithm on the restrictions of to this line to obtain a list . We hope that is in this list, and that does not have the same order evaluation777This is why we take large: it is much more unlikely that there will be a collision of higher order evaluations at the random point . on as any other element of – this will allow us to identify it with the help of the advice . Once we identify , we output its value at as our guess for .

To generate the advice , we give an algorithm , which takes as input a point , has query access to , and returns a short list of guesses for . Recall that we have quite a bit larger than . This algorithm is discussed in Section 5.2. Briefly, works by choosing random lines through and running the (global) list-recovery algorithm for univariate multiplicity codes on the restriction of the lists to these lines. Then it aggregates the results to obtain . This aggregation turns out to be a list-recovery problem for Reed-Muller codes evaluated on product sets. We describe this algorithm for list-recovery in Appendix D.

Summarizing, our local list-recovery algorithm works as follows. First, we run on a random point to generate a short list of possibilities for . Then, for each , we will form the oracle machine . We are not quite done even if the advice is good, since may not be equal to ; we know this probably happens for most ’s, but not necessarily for the one that we care about. Fortunately, will agree with for many inputs , and so we can use the fact that multivariate multiplicity codes are locally correctable to finish the job [KSY14]. When we iterate over the advice , this will give the list of randomized algorithms that the local list-recovery algorithm returns.

1.3.3 Organization

We begin in Section 2 with notation and preliminary definitions. Once these are in place, we will prove Theorem 3.1 about Folded RS codes in Section 3. In Section 4, we extend our analysis of Folded RS codes to univariate multiplicity codes, and prove Theorems 4.1 and 4.4 for small and large degrees respectively. In Section 5, we present our local list-recovery algorithm for multivariate multiplicity codes, and state Theorems 5.1 and 5.2 about high-rate local list-recovery of multivariate multiplicity codes. Finally in Section 6 we run our results through the expander-based machinery of [AEL95], to obtain Theorems 6.1 and 6.2 which give capacity-achieving locally list-recoverable codes over constant-sized alphabets.

2 Notation and Preliminaries

We begin by formally defining the coding-theoretic notions we will need, and by setting notation. We denote by the finite field of elements. For any pair of strings , the relative distance between and is the fraction of coordinates on which and differ, and is denoted by . For a positive integer we denote by the set containing all subsets of of size , and for any pair of strings and we denote by the fraction of coordinates for which , that is, . Throughout the paper, we use to denote . Whenever we use , it is to the base . The notation and means that we treat as a constant; that is, .

2.1 Error-correcting codes

Let be an alphabet and let be a positive integer (the block length). A code is simply a subset . The elements of a code are called codewords. If is a finite field and is a vector space over , we say that a code is -linear if it is an -linear subspace of the -vector space . In this work most of our codes will have alphabets , and we will use linear to mean -linear. The rate of a code is the ratio , which for -linear codes equals . The relative distance of is the minimum such that for every pair of distinct codewords it holds that .

Given a code , we will occasionally abuse notation and think of as a map , where is some domain of size . With this notation, the map corresponds to the vector .

For a code of relative distance , a given parameter , and a string , the problem of decoding from  fraction of errors is the task of finding the unique (if any) which satisfies .

2.2 List-decodable and list-recoverable codes

List decoding is a paradigm that allows one to correct more than a fraction of errors by returning a small list of close-by codewords. More formally, for and an integer we say that a code is -list-decodable if for any there are at most different codewords which satisfy that .

List recovery is a more general notion where one is given as input a small list of candidate symbols for each of the coordinates and is required to output a list of codewords that are consistent with many of the input lists. Formally we say that a code is -list-recoverable if for any there are at most different codewords which satisfy that . Note that list decoding corresponds to the special case of .

2.3 Locally correctable and locally list-recoverable codes

Locally correctable codes.

Intuitively, a code is said to be locally correctable [BFLS91, STV01, KT00] if, given a codeword that has been corrupted by some errors, it is possible to decode any coordinate of by reading only a small part of the corrupted version of . Formally, it is defined as follows.

Definition 2.1 (Locally correctable code (LCC)).

We say that a code is -locally correctable if there exists a randomized algorithm that satisfies the following requirements:

  • Input: takes as input a coordinate and also gets oracle access to a string that is -close to a codeword .

  • Query complexity: makes at most queries to the oracle .

  • Output: outputs with probability at least .

Remark 2.2.

By definition it holds that . The above success probability of  can be amplified using sequential repetition, at the cost of increasing the query complexity. Specifically, amplifying the success probability to requires increasing the query complexity by a multiplicative factor of .

Locally list-recoverable codes.

The following definition generalizes the notion of locally correctable codes to the setting of list decoding / recovery. In this setting the algorithm is required to find all the nearby codewords in an implicit sense.

Definition 2.3 (Locally list-recoverable code).

We say that a code is -locally list-recoverable if there exists a randomized algorithm that satisfies the following requirements:

  • Input: gets oracle access to a string .

  • Query complexity: makes at most queries to the oracle .

  • Output: outputs randomized algorithms , where each takes as input a coordinate , makes at most queries to the oracle , and outputs a symbol in .

  • Correctness: For every codeword for which , with probability at least over the randomness of the following event happens: there exists some such that for all ,

    where the probability is over the internal randomness of .

We say that has running time if outputs the description of the algorithms in time at most and each has running time at most . We say that a code is -locally list-decodable if it is -locally list-recoverable.

2.4 Polynomials and derivatives

Let be the space of univariate polynomials over . We will often be working with linear and affine subspaces of . We will denote linear subspaces of by the letters , and affine subspaces of as , where and is a linear subspace.

For polynomials , we define their Wronskian, , by

For , we define the ’th (Hasse) derivative as the coefficient of in the expansion

For multivariate polynomials , we use the notation and where . For , we define the ’th (Hasse) derivative by

2.5 Some families of polynomial codes

In this section, we formally define the families of codes we will study: folded Reed-Solomon codes [GR08], univariate multiplicity codes [RT97, KSY14, GW13], and multivariate multiplicity codes [KSY14].

Folded Reed-Solomon codes.

Let be a prime power, and let be nonnegative integers such that . Let be a primitive element of , and let be distinct elements in . Let .

For a polynomial and , let denote the vector:

The folded Reed-Solomon code is a code over alphabet . To every polynomial of degree at most , there corresponds a codeword :

where for each :


We denote the codeword of corresponding to the polynomial by (when the parameters are clear from the context).

Note that Reed-Solomon codes correspond to the special case of . The following claim summarizes the basic properties of folded Reed-Solomon codes.

Claim 2.4 ([Gr08]).

The folded Reed-Solomon code is an -linear code over alphabet of block length , rate , and relative distance at least .

Univariate multiplicity codes.

Let be a prime power, and let be nonnegative integers such that . Let be distinct elements in . Let .

For a polynomial , let denote the vector:

The univariate multiplicity code is a code over alphabet . To every polynomial of degree at most , there corresponds a codeword :

where for each :