1.1 Weighted Reed-Muller codes
Weighted Reed-Muller codes were introduced by Sørensen in 1992, as a generalisation of Reed-Muller codes in the context of weighted polynomial rings [Sør92]. Formally, given a finite field , a weight and a polynomial
the weighted degree of with respect to is
In particular, if , then we get the usual notion of total degree for multivariate polynomials.
In order to build codes from subspaces of polynomials, we consider the evaluation map
Then, a weighted Reed-Muller code is defined as the image by of a subspace of polynomials whose weighted degree is bounded by some integer .
Definition 1.1 (Weighted Reed-Muller code).
Let , and . The weighted (affine) Reed-Muller code of order , degree and weight is:
Note that weighted Reed-Muller codes are generalised Goppa codes on the weighted projective space with evaluation points outside the line at infinity .
The dimension of weighted Reed-Muller codes, as well as bounds on the minimum distance, are given by Sørensen in his seminal paper [Sør92]. Notice that these parameters are also analysed in a recent work [ACG17] by Aubry, Castryck, Ghorpade, Lachaud, O’Sullivan, and Ram, who also describe minimum weight codewords with geometric techniques. Geil and Thomsen [GT13] finally proved that weighted Reed-Muller codes are efficiently decodable up to half their minimum distance, notably using an embedding of weighted Reed-Muller codes into Reed-Solomon codes.
1.2 Technical overview and organisation
In this work, we will only focus on the case where and is of the form where . This setting seems very restrictive, but it is the most promising in terms of parameters (see for instance [Sør92, GT13]) and it also finds a practical application in private information retrieval protocols. For simplicity, we will use the shorter notation for .
Our first observation is that, when , the evaluation map is injective. This has two major consequences: (i) the code and its parameters are easier to describe and (ii) puncturing the code on “lines of weighted degree ” leads to highly-sound local correction. More precisely, in Section 2 we prove the following result.
Theorem 1.2 (informal).
Let , be a prime power and . For a fixed small enough, the family of weighted Reed-Muller codes are -locally correctable, where .
This result is obtained thanks to the following fact. Let be a univariate polynomial of (non-weighted) degree bounded by , and let . Then for every , the restriction
of the vectorto the coordinates indexed by elements of is a codeword of a Reed-Solomon code of degree . Hence, if the codeword is corrupted with a constant fraction of errors, picking at random and correcting
succeeds with constant probability. As a consequence, it allows us to retrieve some symbols of the corrupted codeword in sublinear query complexity.
However, results described above do not improve the related “local decoding on curves” technique, described for instance by Yekhanin in his survey [Yek12]. Fortunately, local correctabilities of weighted Reed-Muller codes can be applied to private information retrieval protocols in order to resist collusion of servers. In particular, we prove that any weighted Reed-Muller code induces a private information retrieval protocol for databases of entries, requiring a minimal computation complexity for the servers, and remaining private against any collusion of servers. We refer the reader to Section 3 for more details.
One should notice that the maximal number of entries in the database is directly given by the dimension of . Unfortunately, the information rate of such codes remains bounded by as long as , a constraint which is necessary in our context. Therefore, following the seminal paper of Guo, Kopparty and Sudan [GKS13] and subsequent works [Guo16, Lav18b], we initiate the study of a weighted lifting of Reed-Solomon codes in order to produce codes with the same local properties as weighted Reed-Muller codes, but with a much larger dimension.
Definitions and essential properties of weighted lifted codes are given in Section 4. Similarly to the constructions of lifted (affine [GKS13] and projective [Lav18b]) Reed-Solomon codes and lifted Hermitian codes [Guo16], we also prove that for fixed and , weighted lifts of Reed-Solomon codes are locally correctable with (i) a non-zero asymptotic information rate in the context of errors with constant relative weight, or (ii) an information rate arbitrary close to when errors have smaller weight.
These two results are the main technical outcomes of the paper, and we present them in Section 5. They are obtained after a precise analysis of so-called degree sets of weighted Reed-Muller and lifted codes, which represent the sets of exponents of monomials spanning the codes. We finally provide numerical computations of dimensions of weighted lifted codes, which illustrate the improvement of weighted lifted codes over weighted Reed-Muller codes, and their practical useability in private information retrieval.
2 Local correction of weighted Reed-Muller codes
2.1 Restricting Reed-Muller codes to weighted lines
The local decoding properties of Reed-Muller codes come from the restriction of their codewords on a line being Reed-Solomon codewords. Expecting similar properties on weighted Reed-Muller codes, we have to find what will play the part of the lines in .
Definition 2.1 (-line on ).
Let . We call a (non-vertical) -line on the set of zeroes of the polynomial where is homogeneous of degree .
Since we evaluate polynomials only at points outside the line , we shall define an -line on the affine plane , viewed as the domain , as the intersection of an -line on and .
Definition 2.2 (affine -line).
Let . We call a (non-vertical) -line on the set of zeroes of a bivariate polynomial , where and .
Let us remark that if defines an -line, then . The converse is not true, since we removed from the definition collections of “vertical lines” defined by , .
An -line can be parametrized by . We thus define
the set of embeddings of -lines into the affine plane . These embeddings are very useful when trying to characterise restrictions of weighted Reed-Muller codes to -lines.
Any polynomial whose evaluation over lies in satisfies for any .
It is sufficient to check the result on monomials. Let where . For every , the univariate polynomial has degree less than . ∎
2.2 Local correction
Local decoding was introduced by Katz and Trevisan [KT00] in order to characterise codes allowing to (probabistically) retrieve a message coordinate with a sublinear number of queries in the code length . The difficulty comes from the fact that the retrieval must succeed with non-negligeable probability for every codeword which is corrupted by any possible error whose weight is bounded by a linear function in . Local correction is very similar to local decoding, the only difference being that one requires that any coordinate of the codeword can be retrieved.
Before giving a formal definition of this notion, let us introduce some notation. We denote the Hamming distance between two vectors by . The weight of is . An erasure is a symbol of a word that one knows to be erroneous. Finally, we denote111take care that this notation (with instead of ) is not the most currently used, but remains very convenient for our work the full-length Reed-Solomon code by
and we recall that can correct efficiently erasure and up to errors.
Definition 2.4 (locally correctable code).
Let , and . A code is said -locally correctable if there exists a probabilitic algorithm such that the following holds. For every and for every such that for some , we have:
the probability222taken over the internal randomness of the decoder that outputs is larger than ;
reads at most coordinates of .
Similarly to the case of classical Reed-Muller codes and codes derived from those, weighted Reed-Muller codes can be locally corrected using their restrictions to “lines”. For simplicity, we see a vector as a map , using the bijection between and given by the evaluation map. Similarly, is seen as a map . One obtains the local correction procedure described in Algorithm 1.
According to Katz and Trevisan’s terminology [KT00], Algorithm 1 is not perfectly smooth, in the sense that the coordinate is never queried. nevertheless, it can be made smooth following techniques described in [Lav18a, Chapter ].
Let , be a prime power, and such that is even. For every , the weighted Reed-Muller code is -locally correctable where .
Let be a corrupted codeword, where and . We define the support of
. The random variable representing the set of queries addressed by the local decoder is denoted by. It is clear that the algorithm succeeds if , where , since a Reed-Solomon of dimension can decode up to erasure and errors. Using Markov’s inequality, the probability of success of Algorithm 1 satisfies:
Moreover, for every , we have . Hence,
Finally we get
3 Application to private information retrieval
Private information retrieval (PIR) protocols are cryptographic protocols ensuring that a user can retrieve an entry of a remote database , without revealing any information on the index to the holder of the database. Additionally, it is also required that the communication cost (number of bits exchanged during the retrieval process) is sublinear in the size of the database.
Since its introduction by Chor, Goldreich, Kushilevitz and Sudan in 1995 [CGKS95], various kinds of PIR schemes have been designed according to the system constraints. In earliest PIR schemes, one assumes that the database is replicated over non-communicating honest-but-curious servers . In this context the seminal result of Katz and Trevisan [KT00] — which relates PIR protocols to the existence of so-called smooth locally decodable codes — induced many new constructions of PIR schemes, notably in [BIKR02, Yek08, Efr12, DG16]. These constructions eventually achieved bits of communication for a -entry database replicated on servers.
Motivated by the use of storage codes in distributed storage systems, a large amount of recent works focused on the case where the database is encoded on the servers. In this context, entries of the database are usually very large (e.g. movies), so that we can assume that the download communication cost prevails over the upload one. Several works aimed at minimizing this cost depending on the storage system: Shah, Rashmi and Ramchandran [SRR14] considered the replication code as the storage code; Tajeddine, Gnilke and El Rouayheb [TGR18] MDS codes; Kumar, Rosnes and Graell i Amat [KRGiA17] arbitrary codes.
It is worth noticing that, following e.g. Beimel and Stahl [BS02], a few works also considered the more restrictive setting of colluding servers (i.e. servers communicating with each other so as to collect information about the required item), byzantine servers (i.e. servers able to produce wrong answers to user’s queries) or unresponsive servers (servers unable to give ananswer to user’s queries).
Finally, one should emphasise that families of PIR schemes referenced above mostly focus on decreasing the communication cost during the retrieval process. This is done at the expense of other crucial parameters, such as the computation complexity of the recovery, or the servers’ storage overhead.
In this section, we show how the local properties of weighted Reed-Muller codes lead to very natural PIR protocols resisting to any set of byzantine, unresponsive and colluding servers — provided that — with moderate communication complexity but optimal computation complexity.
Definition 3.1 (private information retrieval).
Let be a remote database distributed on servers , in such a way333Notice that we make no other assumption on the way (replication, encoding, etc.) the database is stored on the servers. We only require that the encoding map is injective. that we assume that each server stores a vector . A private information retrieval (PIR) protocol for is a tuple of algorithms such that:
is a probabilistic algorithm taking as input a coordinate , and providing a random tuple of queries for some finite set ;
is a deterministic algorithm taking as input a server index , a query and the vector stored by server , and outputs an answer , where is a finite set;
is a deterministic algorithm taking as input a coordinate , a tuple of queries and a tuple of answers , and which outputs a symbol satisfying the following requirement. If and , then:
We also say that a PIR protocol
is -private (or resists to any collusion of servers) if for every , , we have
where denotes the mutual information between random variables;
is robust against byzantine and unresponsive servers if (1) holds when up to symbols of differ from the expected ones, and up to symbols of are missing.
Let us now define some of the most studied parameters of PIR protocols.
Let be a PIR protocol. We define:
its communication complexity as ;
its server computation complexity, denoted as the maximal number of operations over necessary to compute ;
its storage rate as the ratio .
We finally say that a PIR protocol is computationally optimal for the servers if .
3.2 The PIR protocol
We present in this section a PIR protocol based on weighted Reed-Muller codes. The protocol relies on a well-suited splitting of the encoded database over the servers, as it was originally done by Augot, Levy-dit-Vehel and Shikfa in [ALS14]
Let , and denote its dimension by . Recall that a codeword can be seen as a map . Let us also consider servers indexed by elements of .
Initialisation. The database is encoded into a codeword . For every , the server receives the part of the codeword . Notice that consists in symbols over .
Queries. Assume one wants to retrieve , for . One can always assume that the encoding map is systematic, hence for some . To define a vector of queries:
Pick at random an -line such that for some .
The server receives a random element .
Server receives such that .
Answers. Upon receipt of , every server reads the entry and sends it back to the user.
Recovery. The user collects and runs an error-and-erasure correcting algorithm for with input . Then, the user returns the corrected symbol .
Let be a prime power, , and . Set . Then, Protocol 3.3 equipped with is -private and robust against byzantine and unresponsive servers. Moreover, it is computationally optimal for the servers, its storage rate approaches when , and its communication complexity is .
The correctness of the PIR scheme, under byzantine and unresponsive servers, comes from Proposition 2.3 and from the fact that corrects errors and erasures if . Moreover, the scheme is -private since any subset of points of an -line gives no information about the other points. Finally, the parameters of the scheme can be easily checked. ∎
4 Towards higher information rate: the lifting process
In previous sections, we have proved that weighted Reed-Muller codes admit local properties that can be used in practical applications such as private information retrieval. However, such constuctions are moderately efficient in terms of storage, since the information rate of is bounded by if .
In this section, we show how to construct codes with the same local properties as weighted Reed-Muller codes, but admitting a much larger dimension. As a practical consequence, these new codes can replace weighted Reed-Muller codes in Protocol 3.3, leading to storage-efficient PIR schemes.
Techniques involved in the construction of these codes directly follow the lifting process initiated by Guo, Kopparty and Sudan [GKS13]. More precisely, the authors introduce so-called lifted Reed-Solomon codes as codes containing (classical) Reed-Muller codes, and satisfying that the restriction of any codeword to any affine line lies in a Reed-Solomon. The purpose of this section is to extend this notion to -lines.
We thus naturally introduce the -lifting of a Reed-Solomon code as follows.
Definition 4.1 (-lifting of a Reed-Solomon code).
Let be a prime power and . The -lifting of the Reed-Solomon code is the code of length defined as follows:
Notice that if , the -lifted code is the trivial full space . Hence, from now on we assume .
It is clear that since the constraints that define -lifted codes are satisfied by each codeword of a comparable weighted Reed-Muller code. But quite surprisingly, the code is sometimes much larger than . Let us highlight this claim with an example.
Let , and . The associated weighted Reed-Muller code is generated by the evaluation vectors of monomials , where lies in
Let us now consider the monomial and an -line , where . We see that for every , we have:
Hence, for every . Since , we get
Given a polynomial , we define its degree set as
By extension, the degree set of a subset is the union of degree sets of polynomials lying in . Similarly, if , then we set .
Since for every , one can consider degree sets as subsets of . This precisely corresponds to considering polynomials modulo the ideal .
Let such that , and let . Assume that for every , we have (respectively, ). Then, there exists an -line such that (respectively, ).
If for every , then lies in , and the degree of is thus . The proof is similar for . ∎
Let . Then,
A pair would contradict Lemma 4.4. ∎
We say that a linear code is monomial if there exists a set of monomials, such that . Monomial codes are convenient since they admit a simple description.
Let us define monomial transformations , for .
Let be a subspace of such that:
for every and every , the polynomial also lies in .
Then is spanned by monomials.
Let where . It is sufficient to prove that for all , the monomial lies in .
For , let us define
Since is a vector space invariant under , we have . Moreover,
Since , . ∎
Let . The linear code is monomial.
The code is the full space ; hence it is trivially a monomial code. For , let us define
Proposition 4.5 ensures that . Let . For every and every we have
Let us now define . One can easily check that . Since , we also know that . Moreover, is invariant under affine transformations, hence . Let us now remark that
Consequently, . Therefore we can use Lemma 4.6, and our result follows immediately. ∎
4.3 The degree set of -lifted Reed-Solomon codes
Previous discussions ensure that, given a tuple , the code is fully determined by its degree set . Let us now seek for characterisations of .
For this purpose, we need to introduce some notation:
denotes the inner product between vectors, or tuples.
We set .
Given and a prime number , we denote by the digit in the representation of in base , i.e. .
For , we write if and only if for every .
For and , we also write .
We will also make use of Lucas theorem [Luc78] which gives the reduction of binomial coefficients modulo primes.
Theorem 4.8 (Lucas theorem [Luc78]).
Let and be a prime number. Recall that is the representation of in base . Then,
In particular, in any field of characteristic , the binomial coefficient is non-zero if and only if .
In the next lemma, we characterise univariate polynomials arising from the restriction of to -lines.
Let and and let us define . We have:
Given a polynomial , the well-known multinomial theorem entails that:
where is a coefficient which only depends on and , and where
The coefficient of the term in is therefore:
where . We claim that for every if and only if for every . Indeed, can be seen as the evaluation of an homogeneous polynomial of degree at the point corresponding to . Since , the polynomial vanishes over if and only if it is the zero polynomial, which proves our claim.
Now, notice that
Hence, using Lucas theorem [Luc78] on every binomial coefficient in the above product, we see that if and only if there exists such that .
In other words, the monomial appears as a term of if and only if there exists such that and
Let us now give some properties on the set defined in Lemma 4.9.
We have . Moreover, an integer belongs to if and only if