Interest in erasure codes has surged in recent years, with the demands of massive cloud storage systems raising hitherto unexplored, yet very natural and mathematically deep, questions concerning the parameters, robustness, and efficiency of the code. Distributed storage systems need to build in redundancy in the data stored in order to cope with the loss or inaccessibly of the data on one or more storage nodes. Traditional erasure codes offer a natural strategy for such robust data storage, with each storage node storing a small part of the codeword, so that the data is protected against multiple node failures. In particular, MDS codes such as Reed-Solomon codes can operate at the optimal storage vs. reliability trade-off — for a given amount of information to be stored and available storage space, these codes can tolerate the maximum number of erasures without losing the stored information.
Individual storage nodes in a large scale system often fail or become unresponsive. Reconstruction (repair) of the content stored on a failed node with the help of remaining active nodes is important to reinstate the system in the event of a permanent node failure, and to allow access to the data stored on a temporarily unavailable node. The use of erasure codes in large storage systems, therefore, brings to the fore a new requirement: the ability to very efficiently reconstruct parts of a codeword from the rest of the codeword.
Local Reconstruction Codes (LRCs), introduced in , offer an attractive way to meet this requirement. An LRC imposes local redundancies in the codewords, so that a single (or a small number of) erased symbol can be recovered locally from less than other codeword symbols.111LRCs are also expanded as Locally Repairable Codes or Locally Reoverable Codes, eg. [19, 20, 13]. Here is the locality parameter that is typically much smaller than the code length . In the distributed storage context, an LRC allows for the low-latency repair of any failed node as one only needs to wait for the response from nodes. LRCs have found spectacular practical applications with their use in the Windows Azure storage system .
The challenge in an LRC design is to balance the locality requirement, that allows fast recovery from a single or few erasures, with good global erasure-resilience (via traditional slower methods) for more worst-case scenarios. One simple metric for global fault tolerance is the minimum distance of the code, which means that any pattern of fewer than erasures can be corrected. The optimal trade-off between the distance, redundancy, and locality of an LRC was established in , and an elegant sub-code of Reed-Solomon codes meeting this bound was constructed in .
This work concerns a much stronger requirement on global fault-tolerance, called Maximal Recoverability. This requires that the code should simultaneously correct every erasure pattern that is information-theoretically possible to correct, given the locality conditions imposed on the codeword symbols. Let us describe it more formally in the setting of interest in this paper. Define an -LRC to be a linear code over of length whose codeword symbols are partitioned into disjoint groups each of which include local parity checks capable of locally correcting erasures. The codeword symbols further obey heavy (global) parity checks. With this structure of parity checks, it is not hard to see that the erasure patterns one can hope to correct are precisely those which consist of up to erasures per local group plus up to additional erasures anywhere in the codeword. A maximal recoverable (MR) LRC is a single code that is capable of simultaneously correcting all such patterns. Thus, an MR code gives the most bang-for-the-buck for the price one pays for locality.
This notion was introduced in  motivated by applications to storage on solid-state devices, where it was called partial MDS codes. The terminology maximally recoverable codes was coined in , and the concept was more systematically studied in [9, 10]. By picking the coefficients of the heavy parity checks randomly, it is not hard to show the existence of MR LRCs over very large fields, of size exponential in . An explicit construction over such large fields was also given in , which also proved that random codes need
such large field sizes with high probability.222This is akin to what happens for random codes to have the MDS property. However, for MDS codes, the Vandermonde construction achieves a linear field size explicitly.
Since encoding a linear code and decoding it from erasures involve performing numerous finite field arithmetic operations, it is highly desirable to have codes over small fields (preferably of characteristic 2). Obtaining MR LRCs over finite fields of minimal size has therefore emerged as a central problems in the area of codes for distributed storage. So far, no construction of MR LRCs that avoids the exponential dependence on has been found. A recent lower bound shows that, unlike MDS codes, for certain parameter settings one cannot have MR LRCs over fields of linear size. This shows that the notion of maximal recoverability is quite subtle, and pinning down the optimal field size is likely a deep question. There remains a large gap between the upper and lower bounds on field size of MR LRCs, closing which is a challenge of theoretical and practical importance.
In this work, we develop a novel approach to construct MR LRCs based on function fields. Our framework recovers and in fact slightly improves most of the previous bounds in the literature in a unified way. We note that since there are at least three quantities of significance — the locality , the local (intra group) erasure tolerance , and number of global parity checks — the landscape of parameters and different constructions in this area is quite complex. Also, depending on the motivation, the range of values of interest of these parameters might be different. For example, if extreme efficiency of local repair is important, should be small. But on the other hand this increase the redundancy and thus storage requirement of the code, so from this perspective a modest (say ) might be relevant. If good global fault tolerance is required, we want larger , but then the constructions have large field size. It is therefore of interest to study the problem treating these as independent parameters, without assumptions on their relative size. We next review the field size of previous constructions, and then turn to the parameters we achieve in different regimes.
1.1. Known field size bounds
For , optimal maximally recoverable local reconstruction codes (MR LRCs, for short) can be constructed by using either Reed-Solomon codes or their repetition. For , constructions of maximally recoverable LRCs over fields of size were given in . For the remaining case: and , there are quite number of constructions in literature [2, 1, 21, 9, 15, 10, 4, 3, 6, 12].
For the cases of and , the best known constructions of MR LRCs were given in  with field sizes of and respectively, uniformly for all . (Their field sizes were worse by factors compared to these bounds when the field is required to be of characteristic .) For most other parameter settings, the best constructions by  provide a family of MR LRCs over fields of sizes
as well as
Recently, by using maximum rank distance (MRD) codes, the paper  (specifically Corollary 14) gives a family of MR LRCs over fields of sizes
where is the dimension of the code.
On the other hand, a lower bound on the field size was presented in . Stating the bound when for simplicity, they show that the field size of an MR LRC must obey
The lower bound (5) is still quite far from the upper bounds (1) and (2). In particular, the exponent of or is to the base growing with in the known constructions, but only to the base in the above lower bound. Thus, one can conjecture that there is still room to improve both the constructions and the lower bounds. We note that under more complex structural requirements on the local groups, notably grid-like topologies and product codes, the optimal field size has been pinned down to .
Several techniques have been employed in literature for constructions of MR LRCs. One prevalent idea is to use a “linearized” version of the Vandermonde matrix, where the heavy parity check part of the matrix consists of columns where for a sufficiently high degree extension field of . This construction is combined with -wise independent spaces to get an field size in , and is also employed in . Another approach is based on rank-metric codes (see, for instance, [4, 18]). Various ad hoc methods have been employed for good constructions of MR LRCs for small , for example for in .
1.2. Our results
In this work, we develop a new approach to construct MR LRCs based on algebraic function fields. We discuss the key elements underlying our strategy in Section 1.4, but for now state the field sizes of the MR LRCS we can construct for various regimes of parameters. Most of the existing results in literature can be recovered through our methods in a unified way. In most regimes, the parameters of our codes beat the known ones. For easy reference, we summarize the different possible trade-offs we can achieve in one giant theorem statement below. Since this comprehensive statement may be overwhelming to parse, let us highlight just two of our significant improvements: item (i) for , where we improve term in (2) quadratically to , and item (vi) for sufficiently large , where the exponent in bounds (1) and (2) is improved to . Also the exponent is replaced by in the bounds (i)-(iv) that improve (2). In the bounds (vii) and (viii) the factor in the exponent is improved to ; this improved is less significant as it only applies to the low-rate setting but included for completeness and also to reflect a construction approach based on generator matrices (as opposed to parity check matrices which is a more potent way to reason about MR LRCs that underlies the other parts of the theorem).
One has a maximally recoverable -local reconstruction code over a field of size with parameters satisfying any of the following conditions. (Below denotes .)
(see Theorem 3.8) , and
(see Theorem 3.9) and
(see Theorem 3.11) for all settings of and
(see Theorem 3.12) for all settings of and
(see Theorem 4.3) and for a positive real and
(see Theorem 4.4) and for a positive real and
(see Theorem 3.3) for all settings of
where is the dimension of the code;
(see Theorem 3.5) and
The first two bounds, and the bounds in (vii) and (viii) of Theorem 1.1 are derived from the rational function fields . In addition, the bounds in (i) and (viii) of Theorem 1.1 are obtained via a combination with binary BCH codes. The bounds in (iii) and (iv) of Theorem 1.1 are derived from rational function field , where is a power of . The fifth bound is obtained via Hermitian function fields, while the sixth bound is derived from the Garcia-Stichtenoth function field tower. Our codes achieving the trade-offs stated in the above theorem can in fact be explicitly specified. But we note that for MR codes even existence questions over small fields are interesting and non-trivial.
Each of our bounds in Theorem 1.1 beats the known results in some parameter regimes. Let us compare them one by one.
1.4. Our techniques
Note that construction of MR LRCs is equivalent to construction of certain generator or parity-check matrices with requirement of column linear independence (see Section 2.1).
Our construction idea departs from previous approaches and is based on function fields over a finite field . The key in constructing an MR LRC is the choice of the heavy parity checks. We now briefly describe our idea to pick these. We associate with each of the local groups a distinguishing (high degree) place , . The degree of the place is chosen large enough to guarantee the existence of at least such places. For each local group, we pick functions , , that have exactly one pole at . The coefficients of the heavy parity checks corresponding to the ’th symbol of ’th local group are chosen to be
where is a place of sufficiently high degree, so that the evaluations belong to an extension field which will be the final alphabet size of the MR LRC. By properties of the Moore determinant (Section 2.2) and the large degree of , the required linear independence of columns such as (6) over reduces to a certain linear independence requirement for the ’s over . Across different local groups such linear independence follows because a function with one pole at cannot cancel a function with one pole at a different place . Within a local group, the required linear independence is ensured by choosing the ’s within a group so that any of them (which is the maximum number of erasures we can have within a group) are linearly independent over .
The paper is organized as follows. In Section 2, we introduce some preliminaries such as MR LRCs (both the generator and parity check matrix viewpoints) and Moore determinants. In Section 3, we present our constructions of MR LRCs using the rational function field together with a concatenation with classical codes of good rate vs. distance trade-off. We give two constructions, using the generator matrix viewpoint in the first part (yielding Parts (vii) and (viii) of Theorem 1.1), and then a parity check based construction in second part which yields Parts (i)-(iv) of Theorem 1.1. This section is elementary and only uses properties of polynomials. In Section 4, we generalize the construction of MR LRCs via parity-check matrix given in Section 3 by making use of arbitrary algebraic function fields. The necessary preliminaries on function fields are deferred to this section as we do not need them in Section 3. We then apply this construction to Hermitian function fields and the Garcia-Stichtenoth tower to obtain MR LRCs promised in Parts (v) and (vi) of Theorem 1.1 respectively.
2.1. Maximally recoverable local reconstruction codes
Throughout this paper, denotes the finite field of elements for a prime power . We use to denote the set of all matrices over .
Consider a distributed storage system where there are disjoint locality groups and each group has size and can locally correct any erasure errors. In addition, the system can correct any erasure errors together with any erasure errors in each group. This requires a class of codes called maximally recoverable local reconstruction codes or partial MDS codes for error correction of such a system. The precise definition of MR LRCs is given below.
Let be a prime power and let be positive integers satisfying . Put and . An -ary -linear code with a generator matrix of the form
is called a maximally recoverable -local reconstruction code (or an MR -LRC, for short) if
each has size ;
the row span of each is an -MDS code for (note that is not a generator matrix of this MDS code in general);
after puncturing columns from each , the remaining matrix of generates an -MDS code.
From the definition, an MR -LRC can correct erasure errors at arbitrarily positions together with any erasure errors in each of groups. The following lemma directly follows from Definition 1.
A matrix is a generator matrix of an MR -LRC if and only if every submatrix of with at most columns per block is invertible.
One can have an equivalent definition via parity-check matrix.
Let be a prime power and let be positive integers satisfying . Put and . An -ary -linear code with a parity-check matrix of the form
is called an MR -LRC if
each has size and each has size ;
each generates an -MDS code for (note that the nullspace of is code);
every columns consisting of any columns in each group and other arbitrary columns are -linearly independent.
2.2. Moore determinant
Let be a power of . For elements , the Moore matrix is defined by
The determinant is given by the following formula
runs through all non-zero direction vectors in. Thus, if and only if are -linearly independent.
3. Explicit constructions via rational function fields
In this section, we only introduce constructions of MR LRCs from rational function fields. Our description will be self-contained and elementary in terms of polynomials and we won’t be requiring any background on algebraic function fields (we have therefore deferred the background on function fields to Section 4 ahead of our more general construction in the next section).
3.1. Constructions via generator matrix
In this subsection, we present constructions of MR LRCs using Definition 1, i.e., via generator matrices of MR LRCs.
Let denote the number of monic irreducible polynomials of degree over . Then one has for any (see [17, Corollary 3.21 of Chapter 3]). This gives . For each monic irreducible polynomial of degree with , we get a polynomial of degree . Thus, for any , there are polynomials of degree such that for all
Assume that (i) ; or (ii) and there is a -ary -linear code, i.e. there exists a subset of of size such that any elements in this subset are -linearly independent.
Choose polynomials of degree such that for all . Then for each , we can form an -vector of dimension . Under our condition on , one can find functions such that any polynomials out of are -linearly independent. Choose an irreducible polynomial such that is coprime with every for . For a function , we use to denote the residue class of in the residue class field .
Let be a subset with . If for some functions , then for all .
Write for some polynomials with . The equality implies that is divisible by . As , we must have that is the zero polynomial. Suppose that for some , then we have
The left hand side of the above equality is divisible by , while the right hand side of the above equality is not divisible by . This contradiction completes the proof. ∎
Let be an irreducible polynomial in of degree
Define the matrix as follows.
Assume that or there is a -ary -linear code. Let be the matrix given in (8). Put and . Then the -ary code with the generator matrix is an MR -LRC.
Let be a submatrix of with at most columns per block . By Lemma 2.1, it is sufficient to show that is invertible. It follows from Subsection 2.2 that this is equivalent to showing that the first row of is -linearly independent.
Let be a subset of for such that the first row of is . Then and . Let be a subset of such that if and only if . Then and hence . Let such that
Since , it follows from Lemma 3.1 that the function for each . As are -linearly independent, we get for all . This completes the proof. ∎
By taking , we obtain the following result.
If , then there exists an MR -LRC of dimension over a field of size
If , put . If , put . Consider the rational function field . To have pairwise coprime polynomials of degree , it is sufficient to satisfy the inequality , i.e., which is the given condition. Now the desired result follows from Lemma 3.2. ∎
By considering binary BCH codes, we obtain the following binary codes.
There exists a binary -linear code with .
Put . Then we have a binary -extended BCH code for any .
Puncturing positions, one gets a binary -linear code. ∎
If , then there exists an MR -LRC of dimension over a field of size
Consider the rational function field and a binary -linear code with . To have pairwise coprime polynomials of degree , it is sufficient to satisfy the inequality . Under the condition that , this inequality is satisfied. Now the desired result follows from Lemma 3.2. ∎
3.2. Constructions via parity-check matrix
To construct parity-check matrices of MR LRCs, we only need to construct matrices given in (7). As we will see, the idea of constructing matrices is quite similar to that of constructing matrices in the previous subsection. Our goal is to prove the following theorem.
Let be positive integers with . Suppose that is a prime power satisfying and there is a -ary -linear code. If (i) ; or (ii) and there exists a -ary -linear code, then there exists an MR -LRC with over a field of size .
We can choose polynomials of degree such that for all . Then for each , we can form an -vector space
of dimension . Under our assumption about , one can find functions such that any polynomials out of are -linearly independent.
Choose an irreducible polynomial of degree and define the matrix
which will complete the proof of Theorem 3.6.
To this end, it is sufficient to prove that the condition (iii) in Definition 2 is satisfied. Let be a subset of with for . Let be a subset of for such that . Put and let be the th column of the block in , i.e., . To prove the condition (iii) in Definition 2, it is equivalent to proving that the determinant is nonzero for all possible and given above.
Put and . Denote by and the submatrices and of consisting columns indexed by and , respectively. Then we have
As is invertible, the product
is equal to
This implies that is nonzero if and only if the matrix
is invertible. Note that the matrix in (10) is a Moore matrix with the first row:
Let be a subset of such that if and only if . Then and hence .
Let such that
By Lemma 3.1, for each . As are -linearly independent, we get for all . This completes the proof. ∎
3.2.1. The case where
Let be integers. Then there is a -ary -MDS code for any prime power . Rewriting Theorem 3.6 for gives the following lemma.
Suppose that . If (i) ; or (ii) and there exists a -ary -linear code, then there exists an MR -LRC over a field of size .
If , then there exists an MR -LRC over a field of size
There exists an MR -LRC over a field of size
Consider the rational function field . Put . Then