Currently, the amount of data stored in a single data center can run into hundreds of petabytes and thus necessarily needs to be stored in several different servers. A central problem for such large amounts of data is that of reliable distributed storage, meaning that the data should be recoverable whenever there is a simultaneous failure of some percentage of the storage servers. A natural solution to this problem is to introduce redundancy by encoding the data via an error-correcting code. The length of this code determines the storage space used while its minimum distance controls the number of simultaneous erasures that can be recovered. However, the recovery of any one erasure may require knowing the values of most other components imposing an excessive communication cost among servers. The theory of locally recoverable codes gives us a way to reduce the communication cost by making the recovery local (see e.g. [papailiopoulos2014locally, tamo2014family]).
More precisely, we think of a file as a vectorwhich we encode and store over several storage nodes (servers) via a codeword where is a (linear) code of dimension . For simplicity we assume that each of our storage nodes stores exactly one coordinate of . In case of (multiple) node failure, we want to be able to recover the lost information as quickly and efficiently as possible. In this regard the locality of a code plays an important role: it denotes the number of nodes one has to contact for repairing a lost node. We call the set of nodes one has to contact if a given node fails, the locality group of that node and call the collection of locality groups a locality configuration.
Furthermore the code is called a partial maximum distance separable (PMDS) code if its distinct locality groups are disjoint and it is maximally recoverable in the sense that any erasure pattern that is information theoretically correctable is effectively correctable with such a code. PMDS codes are thus objects of great practical importance.
More concretely, a locally repairable code is PMDS with global parameter if its locality configuration is a partition of the components and satisfies:
The restrictions of all codewords to the components indexed by locality set are an MDS code (with length and dimension ) for .
Any word can be recovered uniquely after erasure of a set of its components of size , when consists of
any components of the locality set for and
an additional set of any components.
has dimension .
It is known that PMDS codes exist for any locality configuration if the field size is large enough [ch07, ht19]. Furthermore, some explicit constructions of PMDS codes are known, e.g. [bl13, bl14, bl16, ca17, ch15, gabrys2018constructions, go14, martinez2019universal]. However, our knowledge of the structure of PMDS codes remains far from being complete.
Algebraic geometric (AG) codes are a basic source [goppa1977codes, Pellikaan, MR1186841] of linear codes with interesting structures. Given a (typically irreducible and nonsingular) variety over and sections of a line bundle on , such codes are built by evaluating the sections on a given finite subset .
In this article we extend algebraic geometric codes to reducible curves over and use them to define evaluation codes endowed with locality sets defined by the irreducible components of . We then ask for conditions under which such constructions yield PMDS codes. Our approaches combine techniques from different areas of mathematics, using geometric, combinatorial and probabilistic methods. The main results in this article are:
(Geometry of AG PMDS codes) In Theorem 3.7 we give a characterization of the PMDS property for algebraic geometric codes in the language of classical projective geometry.
(Explicit constructions of AG PMDS codes with global parameter ) We use our geometric interpretation of the PMDS property to give simple explicit constructions of PMDS codes with global parameter for all localities. Moreover, in Theorem 4.4 we provide a new construction for and , which improves the smallest field size obtained by previous explicit constructions.
(Existence of AG PMDS codes) In Theorem 5.3 we prove that there exist geometric PMDS codes for all locality configurations and all global parameters for all sufficiently large field sizes. This method is nonconstructive but leads to an explicit bound on the field size.
(Randomized construction of AG PMDS codes) In Sections 6 and 7 we address the lack of explicitness in the previous result. More specifically we specialize our curve to be an arrangement of lines and analyze the probability that evaluation at a suitably randomized set of points on leads to a geometric PMDS code. Our main result is Theorem 7.6 which guarantees that such codes exist whenever . Crucially the probabilistic approach is nearly constructive in the sense that for
of our choosing it provides us with a probability distribution onfor which independent sampling leads to a PMDS code with the desired parameters with probability at least ,
(Improved probabilistic estimates of field size) Finally, in Theorem 7.13 we use the probabilistic method with alterations to obtain estimates for the sizes of fields over which PMDS codes (with localities ) must exist, improving them to , a bound which compares favorably with most PMDS existence results in the literature (see Remark 7.15).
The material in the article is organized as follows: Section 2.1 contains some basic material on coding theory and algebraic geometry over finite fields. Section 3 contains the basic construction of codes from reducible curves and a characterization of PMDS codes in the language of projective geometry. Section 4 contains an explicit construction of PMDS codes with global parameter . Section 5 proves the existence of algebraic geometric PMDS codes for all localities and all for sufficiently large fields. Section 6 focuses on codes constructed from unions of lines and describes the possible obstructions for such codes to be PMDS. Section 7 summarizes the key ideas of the probabilistic method in combinatorics and applies it to prove the probabilistic results described in items and above.
Acknowledgements. We wish to thank Juan Sebastián Diaz for useful conversations during the completion of this work. A. Neri is funded by Swiss National Science Foundation, through grant no. 187711. M. Velasco is partially supported by research funds from Universidad de los Andes, Facultad de Ciencias, Proyecto INV-2018-50-1392.
We will use the notation throughout the paper. For a prime power we let denote the finite field with elements.
2.1. Coding theory preliminaries
We begin with a brief introduction to the theory of error-correcting codes in the Hamming metric. For a more detailed treatment the reder should refer to [van_lint].
Let be positive integers. By an code we mean a linear subspace of dimension . If such a code has minimum Hamming distance we will call it an code. By the Singleton bound any code satisfies . The codes achieving the equality are called maximum distance separable (MDS) codes. Such codes are capable of correcting erasures.
Recall that the group of linear isometries of (i.e., of linear maps that preserve the Hamming distance) consists of componentwise scalings and permutations. More precisely, this group corresponds to , which acts on as
Most properties of interest in coding theory are invariant under this group and it is therefore reasonable to introduce the following equivalence relation.
We say that two codes and are equivalent if there exists a linear isometry of which maps one onto the other.
2.2. PMDS codes
For we let be the image of the projection of on the coordinates labeled by the indices in .
Let be a linear code of dimension and let be positive integers. We say that has block-locality with global parameter if we can write as a disjoint union
of subsets of cardinalities such that:
each projection is a MDS code, and
for any set such that and , the projection map is an isomorphism.
We will say that is a PMDS code, if . For such a code, we call any set of coordinates as above a maximal correctable erasure pattern.
Equivalently, a code is PMDS if we can correct any erasures locally in the code , and any additional global erasures. One often wishes to restrict to the homogeneous case wherein and for all and some , but we do not necessarily make this assumption. Note that the PMDS property is invariant under code equivalence since it is obviously invariant under coordinate scalings and permutations.
For the PMDS definition to be sensible we need . Note that if equality is achieved, with for all , then we recover MDS codes as a special case (see [ht17]*Proposition 5). We will therefore assume that and in the homogeneous case that throughout.
2.3. Preliminaries on algebraic geometry
If is a variety over we let be the set of -rational points of . We denote the -dimensional affine (resp. projective) space over by (resp. by ). For a subset we denote by the projective subspace of spanned by (i.e., the subvariety defined by the set of linear forms vanishing on ). A set of points is in linearly general position if any subset of size of spans a projective space of dimension . A variety
is nondegenerate if it is not contained in any hyperplane.
By a rational normal curve in we mean a curve projectively equivalent to the image of the -th Veronese morphism given in homogeneous coordinates by all monomials of degree ,
By computing a Vandermonde determinant it is easy to see that any set of distinct points lying on a rational normal curve is in linearly general position in . Over an algebraically closed field this property characterizes rational normal curves among all irreducible and non-degenerate curves in . Rational normal curves can also be characterized as the only non-degenerate irreducible curves of degree in [Harris]*Proposition 19.9.
3. Codes from Reducible Projective Curves
In this section we give a procedure for constructing evaluation codes endowed with a locality structure from reducible projective curves. To begin, we construct codes from sets of points in projective space.
3.1. Constructing codes from points in projective space
Let be a finite set of points. For each point fix an affine representative and let be the evaluation map
We define the algebraic geometric (AG) code determined by to be
A point in has several distinct affine representatives and therefore the previous construction leads to many different possible codes . We claim that all these choices lead to equivalent codes. This is because if and are distinct affine representatives for the point then there exist nonzero scalars such that for every . As a result, for any the equality holds and therefore the code obtained from evaluation at the representatives is the result of scaling the code obtained from evaluation at the representatives by independently scaling the components with the vector .
Furthermore, an automorphism of acts on by a permutation of its elements, leading to a permutation of the words in the code, and hence keeps the code unchanged.
The previous remark implies that, up to code equivalence, the code defined above is completely determined by the set of points up to projective automorphisms. It follows that any code property should be interpretable in the language of projective geometry (see [MR1186841]*Theorem 1.1.6 for a proof that this is in fact an equivalence). The main result of this section is Theorem 3.7 below, which recasts the PMDS property in the language of projective geometry.
As a first application of this philosophy, we begin by giving a geometric interpretation to code projections. For any set we denote by the image of the projection of onto the coordinates indexed by the points of . More precisely is the image of the composition where is the projection onto the coordinates indexed by the points of . The following simple lemma gives a geometric description of the codes .
The composition induces an isomorphism between the space of linear forms in and the code . In particular:
if and only if has dimension , and
spans the ambient space if and only if is an isomorphism.
As in Definition 3.1 let be the space of linear forms in , let be the evaluation at the chosen affine representatives of the points of and let be the projection. The code is by definition the image of . The kernel of this map consists of the linear forms vanishing identically at all points of and therefore defines an isomorphism between and . The quotient is canonically isomorphic to the space of linear forms on proving the initial claim and part (1). For part note that if and only if the kernel is trivial, and therefore this is equivalent to being an isomorphism between and which factors through . ∎
(Reed-Solomon codes) Let be the rational normal curve over . The AG code defined by is a (extended) Reed-Solomon code. This code has length . Moreover, since every set of points of is in linearly general position, Lemma 3.3 proves that whenever the code has dimension and that the projection of onto any set of coordinates is an isomorphism, proving that is an MDS code.
3.2. Constructing codes from reducible curves
To construct AG codes with desirable properties, we often specify the sets as subsets of other varieties. The following construction, which uses reducible curves, will be our source for constructing PMDS codes.
Let be distinct irreducible curves over . Define and let be a given set of points, each lying in at most one of the , and define a partition into disjoint subsets of cardinalities , determined by the irreducible components of via . Let be the AG code defined by as in Definition 3.1. For , let denote the image of the projection of onto the coordinates in corresponding to the points of . We will refer to the as the local codes of .
Assume moreover that we are given positive integers and which satisfy the inequalities for every and the equality .
We call a set an evaluation set if for . We say that is admissible if the following two conditions hold:
The projective subspace has dimension and the points of are in linearly general position in .
Every evaluation set of size spans .
The following statements are equivalent:
The code from Definition 3.5 is a PMDS code with blocks given by the , block locality and global parameter .
is an admissible set with respect to the partition .
We will prove the claim by showing that property (resp. ) in Definition 2.2 and property (resp. ) of Definition 3.6 are equivalent. By construction the code has length . By Lemma 3.3 the code has dimension if and only if the space is isomorphic to . Furthermore is a MDS code if and only if the projection is an isomorphism for every set with . By Lemma 3.3 this condition is equivalent to the fact that the points of are in linearly general position in . Finally let be a set with and and let be the complement of . It follows that is admissible for and has cardinality . By Lemma 3.3 spans if and only if the projection is an isomorphism. ∎
The following two examples illustrate our construction.
Example 3.8 (Explicit construction of a simple PMDS AG code).
We will construct an AG code using components. Let and be the - and -axes, respectively, in the projective plane :
and let . Define
which contains evaluation points. The resulting AG code clearly has components, each giving an MDS local code with parameters . This last statement is immediate because any set of distinct points of cardinality at least two on a line spans it and is in linearly general position. Moreover, the whole code has dimension , since the set clearly spans all of .
Lastly, we claim that is a PMDS code with global parameter . Indeed, the complement of any maximal erasure pattern as in Definition 2.2 corresponds to three points on , with each of the lines containing at most two of these points. Such a set will always span , hence any such is an information set of and maximal erasure patterns are always correctable. This proves that is a PMDS code.
Example 3.9 (Non-example of a PMDS AG code).
Let us show by example that a naïve generalization of Example 3.8 to other parameters results in sets of evaluation points such that and , but . We choose , , and . Thus , and the reducible curve is a union of two conics in . Choose any subset such that and . For the resulting AG code to be PMDS, it is necessary that spans all of . Suppose that are not collinear (a necessary condition for to be PMDS) and therefore span a hyperplane . The intersection of and consists of two points, one of which is and the other of which is some other point also defined over . If then only spans a hyperplane, and the code cannot correct the corresponding erasure pattern.
This last example shows that a more careful choice of is necessary to guarantee the PMDS property. More precisely, our constructions in the following sections will choose the evaluation points so that no such co-hyperplanar critical evaluation sets exist as subsets of . For example, for all subsets as in Example 3.9, we must choose so that (i) this triple is not collinear, and (ii) . A much sparser subset of the -points of is necessary to achieve this goal.
It is common in coding theory to construct evaluation codes from algebraic curves in a more abstract setting: We are given a curve (typically irreducible and non-singular), a vector space of sections of a line bundle on and a finite set . By embedding in projective space via the morphism specified by the sections in (resolving indeterminacies if necessary) we can think of or rather of its image as a curve in and apply the construction above with the set . The concrete projective approach above is therefore more general than the abstract approach.
4. Explicit Constructions of Algebraic Geometric PMDS codes with Global Parameter One and Two
In this section we focus on explicit constructions of geometric PMDS codes for . We begin with for any choice of localities . We hope its simplicity will convince the reader of the usefulness of the projective viewpoint when constructing PMDS codes. Given we let and construct a reducible curve which is a disjoint union of rational normal curves of degrees . Furthermore we will construct a finite set and show that it is admissible concluding, via Theorem 3.7, that the corresponding code is indeed PDMS. More precisely, let and let be a set of points in linearly general position in . Split the points into disjoint subsets of sizes and let be the projective subspaces (of dimension ) spanned by the subsets. Note that for each the subspace is a projective space of dimension that intersects in exactly one point, which we denote by . For each let be a rational normal curve over of degree in passing through and through those which are contained in (such a curve exists because there are such points and they are linearly independent). Finally let , and be the AG code defined by .
The set is admissible and therefore the code is a PMDS code with localities and .
Clearly, the points in every are in general position in , since is a subset of a rational normal curve which spans . We only need to show that every evaluation set of size spans . By definition of evaluation set, we have that there exists a such that , while for every . Therefore for every . Let , and call , which has dimension . Hence, we have . In order to prove that it is equivalent to show that , by a dimension argument. Suppose by contradiction that . Since and by construction , then this implies that . Therefore, lie on the same -dimensional subspace. However, this is not possible by construction, since the points are all distinct and lie on the rational normal curve . This concludes the proof. ∎
This construction slightly generalizes the one given in [ch15], where the authors used extended Reed-Solomon codes as local codes. Their construction indeed coincides with ours when . Moreover, it is a special case of the characterization given in [ht17], where it was shown that every PMDS code with global parameter is constructed using MDS codes as local codes.
We conclude this section by providing an explicit construction of algebraic geometric PMDS codes with global parameter and localities . This is the first known explicit construction for such codes over finite fields of cardinality . We remark that, in the homogeneous case where also , the construction in [bl14] requires a field size .
We will restrict our attention to codes constructed from evaluation sets where is a reducible curve in with the property that its components are distinct generic lines. More precisely, we assume that is a given positive integer, that for and that , so the equality becomes . Assume , let be the rational normal curve in and choose distinct points in . Define the lines for and let . We will show that an appropriate choice of is admissible and conclude, via Theorem 3.7, that the corresponding code is PDMS with localities and .
The key point of the construction is understanding the structure of the obstructions to admissibility, which we can do completely explicitly for . To this end, for every , define the maps by
where the notation means that is not taken in the spanning set. Furthermore, define for every .
Note that is well-defined, since any lines and an additional single point on one of the remaining lines always span a hyperplane, which intersects the last line in exactly one point.
With the notation above, the following hold.
for every .
is bijective for every .
If or , then the statement is trivial. If and then define the space . Fix a point and call , that is . Hence , since . This implies that , and thus .
Let now be pairwise distinct. Define the space . Fix a point and call and . This means that and . We want to show that . Consider now the two spaces and let be their intersection. The two spaces are distinct hyperplanes and hence . Moreover, it is easy to see that . Consider now the space . Clearly, , so it has dimension at least . Moreover, suppose that . Then we would have , which would produce a dependence among the points. This contradicts the fact that the points are in general positions. Therefore, we can deduce that has dimension . Hence . Take now the space . We have that , and we conclude that and thus
Follows immediately from with .
The maps allow us to define an equivalence relation on by saying that two points , are equivalent if and only if . By the previous lemma, the equivalence class of a point is given by the set . These sets give us a partition of into disjoint sets of size and furthermore, for every , the points in are a system of distinct representatives for the equivalence relation (i.e., the points of parametrize the equivalence classes).
Let be a set containing at least two points from each and at most one element in each equivalence class, and let . Note that exists because we are assuming . We are now in a position to prove the main result of this section.
Any set defined as above is admissible and therefore the AG code defined by is a PMDS code with localities and .
Since we chose the sets to have cardinality at least , we have . Let be an evaluation set of size . There are two possibilities for the intersection of with the ’s:
Case I: There exists such that and for every . Hence,
where the last equality follows from the fact that the points are in general position.
Case II: There exists two distinct integers such that and for every . Let be the point in and be the point in . Moreover, we define , which by genericity of the points ’s has dimension and is contained in . Moreover, again by genericity, none of the points belongs to . Hence has dimension at least , and has dimension exactly if and only if . By definition of the map , this is true if and only if , that is, and belong to the same equivalence class. However, this is not possible, since we only selected at most one point in each equivalence class for constructing the set . Thus, .
Therefore, is admissible and is a PMDS code by Theorem 3.7. ∎
We illustrate the result of Theorem 4.4 by constructing a PMDS code over , with , and for . We choose the following eight points in :
Then we define the lines , for . The first line is then given by
We can compute . Moreover, for the remaining points, define for any . We have
One can also compute , obtaining and
Finally, we have and
This gives us the equivalence classes
Now, we select a point from each set . For instance, we do the following choice: for each set , we take the point in if and only if , and we then select the point from which belongs to . This produces the code whose generator matrix is
Observe that if we have fixed and , then starting with the sets and for computed above, we can construct PMDS codes of any length , where the local codes are codes over .
It is tempting to try a similar approach for constructing PMDS codes with global parameter starting from a reducible curve composed of generic lines. However, the obstructions quickly become difficult to manage because one has to explicitly describe several distinct combinatorial types of obstructions, the number of types increasing with . In Sections 6 and 7 we abandon this approach in favor of methods from probability theory aiming to quantify the relative sizes of such obstructions in order to obtain results for all .
5. Existence of Algebraic Geometric PMDS Codes
Assume we are given positive integers , and satisfying and . In this section we prove the existence of algebraic geometric PMDS codes with arbitrary localities and global parameter over for all sufficiently large field sizes . To this end let be the rational normal curve of degree in . If then contains a set consisting of distinct points in . Split into disjoint subsets of size , . Since consists of distinct points in the rational normal curve , the points of are in linearly general position and in particular the projective subspace has dimension . Let be a rational normal curve of degree in over containing the points of and let . By construction, if . The reducible curve will be our main tool for constructing AG PMDS codes as in Definition 3.5.
Note that the set consists of points, including exactly in . Since the points of are in linearly general position. It follows that every subset is an evaluation set and that every subset of size of spans . We conclude that is admissible for every .
The number of erasures that the PMDS code of an admissible can recover is precisely and therefore we would like to be both admissible and as large as possible. We will show that, whenever the field size is sufficiently large, it is possible to add a point to an admissible set and obtain a bigger, but still admissible set.
We say that a component of is selected by an evaluation set whenever . The key to the construction is the following Lemma
If is admissible then every evaluation set of size spans a hyperplane in which does not contain any component not selected by .
Let be an evaluation set of size . Since there exists a component such that and in particular there is a point which is not in . If has codimension at least two then by adding to we obtain an evaluation set of cardinality which does not span , contradicting the admissibility of . It follows that spans a hyperplane in . Suppose that a component not selected by satisfies . Since is not selected by the strict inequality holds and therefore there exists a point