1. Introduction
Interest in erasure codes has surged in recent years, with the demands of massive cloud storage systems raising hitherto unexplored, yet very natural and mathematically deep, questions concerning the parameters, robustness, and efficiency of the code. Distributed storage systems need to build in redundancy in the data stored in order to cope with the loss or inaccessibly of the data on one or more storage nodes. Traditional erasure codes offer a natural strategy for such robust data storage, with each storage node storing a small part of the codeword, so that the data is protected against multiple node failures. In particular, MDS codes such as ReedSolomon codes can operate at the optimal storage vs. reliability tradeoff — for a given amount of information to be stored and available storage space, these codes can tolerate the maximum number of erasures without losing the stored information.
Individual storage nodes in a large scale system often fail or become unresponsive. Reconstruction (repair) of the content stored on a failed node with the help of remaining active nodes is important to reinstate the system in the event of a permanent node failure, and to allow access to the data stored on a temporarily unavailable node. The use of erasure codes in large storage systems, therefore, brings to the fore a new requirement: the ability to very efficiently reconstruct parts of a codeword from the rest of the codeword.
Local Reconstruction Codes (LRCs), introduced in [9], offer an attractive way to meet this requirement. An LRC imposes local redundancies in the codewords, so that a single (or a small number of) erased symbol can be recovered locally from less than other codeword symbols.^{1}^{1}1LRCs are also expanded as Locally Repairable Codes or Locally Reoverable Codes, eg. [19, 20, 13]. Here is the locality parameter that is typically much smaller than the code length . In the distributed storage context, an LRC allows for the lowlatency repair of any failed node as one only needs to wait for the response from nodes. LRCs have found spectacular practical applications with their use in the Windows Azure storage system [14].
The challenge in an LRC design is to balance the locality requirement, that allows fast recovery from a single or few erasures, with good global erasureresilience (via traditional slower methods) for more worstcase scenarios. One simple metric for global fault tolerance is the minimum distance of the code, which means that any pattern of fewer than erasures can be corrected. The optimal tradeoff between the distance, redundancy, and locality of an LRC was established in [11], and an elegant subcode of ReedSolomon codes meeting this bound was constructed in [20].
This work concerns a much stronger requirement on global faulttolerance, called Maximal Recoverability. This requires that the code should simultaneously correct every erasure pattern that is informationtheoretically possible to correct, given the locality conditions imposed on the codeword symbols. Let us describe it more formally in the setting of interest in this paper. Define an LRC to be a linear code over of length whose codeword symbols are partitioned into disjoint groups each of which include local parity checks capable of locally correcting erasures. The codeword symbols further obey heavy (global) parity checks. With this structure of parity checks, it is not hard to see that the erasure patterns one can hope to correct are precisely those which consist of up to erasures per local group plus up to additional erasures anywhere in the codeword. A maximal recoverable (MR) LRC is a single code that is capable of simultaneously correcting all such patterns. Thus, an MR code gives the most bangforthebuck for the price one pays for locality.
This notion was introduced in [2] motivated by applications to storage on solidstate devices, where it was called partial MDS codes. The terminology maximally recoverable codes was coined in [9], and the concept was more systematically studied in [9, 10]. By picking the coefficients of the heavy parity checks randomly, it is not hard to show the existence of MR LRCs over very large fields, of size exponential in . An explicit construction over such large fields was also given in [9], which also proved that random codes need
such large field sizes with high probability.
^{2}^{2}2This is akin to what happens for random codes to have the MDS property. However, for MDS codes, the Vandermonde construction achieves a linear field size explicitly.Since encoding a linear code and decoding it from erasures involve performing numerous finite field arithmetic operations, it is highly desirable to have codes over small fields (preferably of characteristic 2). Obtaining MR LRCs over finite fields of minimal size has therefore emerged as a central problems in the area of codes for distributed storage. So far, no construction of MR LRCs that avoids the exponential dependence on has been found. A recent lower bound shows that, unlike MDS codes, for certain parameter settings one cannot have MR LRCs over fields of linear size. This shows that the notion of maximal recoverability is quite subtle, and pinning down the optimal field size is likely a deep question. There remains a large gap between the upper and lower bounds on field size of MR LRCs, closing which is a challenge of theoretical and practical importance.
In this work, we develop a novel approach to construct MR LRCs based on function fields. Our framework recovers and in fact slightly improves most of the previous bounds in the literature in a unified way. We note that since there are at least three quantities of significance — the locality , the local (intra group) erasure tolerance , and number of global parity checks — the landscape of parameters and different constructions in this area is quite complex. Also, depending on the motivation, the range of values of interest of these parameters might be different. For example, if extreme efficiency of local repair is important, should be small. But on the other hand this increase the redundancy and thus storage requirement of the code, so from this perspective a modest (say ) might be relevant. If good global fault tolerance is required, we want larger , but then the constructions have large field size. It is therefore of interest to study the problem treating these as independent parameters, without assumptions on their relative size. We next review the field size of previous constructions, and then turn to the parameters we achieve in different regimes.
1.1. Known field size bounds
For , optimal maximally recoverable local reconstruction codes (MR LRCs, for short) can be constructed by using either ReedSolomon codes or their repetition. For , constructions of maximally recoverable LRCs over fields of size were given in [2]. For the remaining case: and , there are quite number of constructions in literature [2, 1, 21, 9, 15, 10, 4, 3, 6, 12].
For the cases of and , the best known constructions of MR LRCs were given in [12] with field sizes of and respectively, uniformly for all . (Their field sizes were worse by factors compared to these bounds when the field is required to be of characteristic .) For most other parameter settings, the best constructions by [6] provide a family of MR LRCs over fields of sizes
(1) 
as well as
(2) 
The bound (1) outperforms the bound (2) when , while the bound (2) is better when . In both the bounds, the field size grows exponentially with and .
Recently, by using maximum rank distance (MRD) codes, the paper [18] (specifically Corollary 14) gives a family of MR LRCs over fields of sizes
(3) 
When , and is close to or is large, (3) is better than bounds (1) or (2). By using probabilistic arguments, the paper [18] shows existence of a family of MR LRCs over fields of sizes
(4) 
where is the dimension of the code.
On the other hand, a lower bound on the field size was presented in [12]. Stating the bound when for simplicity, they show that the field size of an MR LRC must obey
(5) 
The lower bound (5) is still quite far from the upper bounds (1) and (2). In particular, the exponent of or is to the base growing with in the known constructions, but only to the base in the above lower bound. Thus, one can conjecture that there is still room to improve both the constructions and the lower bounds. We note that under more complex structural requirements on the local groups, notably gridlike topologies and product codes, the optimal field size has been pinned down to [16].
Several techniques have been employed in literature for constructions of MR LRCs. One prevalent idea is to use a “linearized” version of the Vandermonde matrix, where the heavy parity check part of the matrix consists of columns where for a sufficiently high degree extension field of . This construction is combined with wise independent spaces to get an field size in [9], and is also employed in [6]. Another approach is based on rankmetric codes (see, for instance, [4, 18]). Various ad hoc methods have been employed for good constructions of MR LRCs for small , for example for in [12].
1.2. Our results
In this work, we develop a new approach to construct MR LRCs based on algebraic function fields. We discuss the key elements underlying our strategy in Section 1.4, but for now state the field sizes of the MR LRCS we can construct for various regimes of parameters. Most of the existing results in literature can be recovered through our methods in a unified way. In most regimes, the parameters of our codes beat the known ones. For easy reference, we summarize the different possible tradeoffs we can achieve in one giant theorem statement below. Since this comprehensive statement may be overwhelming to parse, let us highlight just two of our significant improvements: item (i) for , where we improve term in (2) quadratically to , and item (vi) for sufficiently large , where the exponent in bounds (1) and (2) is improved to . Also the exponent is replaced by in the bounds (i)(iv) that improve (2). In the bounds (vii) and (viii) the factor in the exponent is improved to ; this improved is less significant as it only applies to the lowrate setting but included for completeness and also to reflect a construction approach based on generator matrices (as opposed to parity check matrices which is a more potent way to reason about MR LRCs that underlies the other parts of the theorem).
Theorem 1.1.
One has a maximally recoverable local reconstruction code over a field of size with parameters satisfying any of the following conditions. (Below denotes .)
The first two bounds, and the bounds in (vii) and (viii) of Theorem 1.1 are derived from the rational function fields . In addition, the bounds in (i) and (viii) of Theorem 1.1 are obtained via a combination with binary BCH codes. The bounds in (iii) and (iv) of Theorem 1.1 are derived from rational function field , where is a power of . The fifth bound is obtained via Hermitian function fields, while the sixth bound is derived from the GarciaStichtenoth function field tower. Our codes achieving the tradeoffs stated in the above theorem can in fact be explicitly specified. But we note that for MR codes even existence questions over small fields are interesting and nontrivial.
1.3. Comparison.
Each of our bounds in Theorem 1.1 beats the known results in some parameter regimes. Let us compare them one by one.
1.4. Our techniques
Note that construction of MR LRCs is equivalent to construction of certain generator or paritycheck matrices with requirement of column linear independence (see Section 2.1).
Our construction idea departs from previous approaches and is based on function fields over a finite field . The key in constructing an MR LRC is the choice of the heavy parity checks. We now briefly describe our idea to pick these. We associate with each of the local groups a distinguishing (high degree) place , . The degree of the place is chosen large enough to guarantee the existence of at least such places. For each local group, we pick functions , , that have exactly one pole at . The coefficients of the heavy parity checks corresponding to the ’th symbol of ’th local group are chosen to be
(6) 
where is a place of sufficiently high degree, so that the evaluations belong to an extension field which will be the final alphabet size of the MR LRC. By properties of the Moore determinant (Section 2.2) and the large degree of , the required linear independence of columns such as (6) over reduces to a certain linear independence requirement for the ’s over . Across different local groups such linear independence follows because a function with one pole at cannot cancel a function with one pole at a different place . Within a local group, the required linear independence is ensured by choosing the ’s within a group so that any of them (which is the maximum number of erasures we can have within a group) are linearly independent over .
1.5. Organization
The paper is organized as follows. In Section 2, we introduce some preliminaries such as MR LRCs (both the generator and parity check matrix viewpoints) and Moore determinants. In Section 3, we present our constructions of MR LRCs using the rational function field together with a concatenation with classical codes of good rate vs. distance tradeoff. We give two constructions, using the generator matrix viewpoint in the first part (yielding Parts (vii) and (viii) of Theorem 1.1), and then a parity check based construction in second part which yields Parts (i)(iv) of Theorem 1.1. This section is elementary and only uses properties of polynomials. In Section 4, we generalize the construction of MR LRCs via paritycheck matrix given in Section 3 by making use of arbitrary algebraic function fields. The necessary preliminaries on function fields are deferred to this section as we do not need them in Section 3. We then apply this construction to Hermitian function fields and the GarciaStichtenoth tower to obtain MR LRCs promised in Parts (v) and (vi) of Theorem 1.1 respectively.
2. Preliminaries
2.1. Maximally recoverable local reconstruction codes
Throughout this paper, denotes the finite field of elements for a prime power . We use to denote the set of all matrices over .
Consider a distributed storage system where there are disjoint locality groups and each group has size and can locally correct any erasure errors. In addition, the system can correct any erasure errors together with any erasure errors in each group. This requires a class of codes called maximally recoverable local reconstruction codes or partial MDS codes for error correction of such a system. The precise definition of MR LRCs is given below.
Definition 1.
Let be a prime power and let be positive integers satisfying . Put and . An ary linear code with a generator matrix of the form
is called a maximally recoverable local reconstruction code (or an MR LRC, for short) if

each has size ;

the row span of each is an MDS code for (note that is not a generator matrix of this MDS code in general);

after puncturing columns from each , the remaining matrix of generates an MDS code.
From the definition, an MR LRC can correct erasure errors at arbitrarily positions together with any erasure errors in each of groups. The following lemma directly follows from Definition 1.
Lemma 2.1.
A matrix is a generator matrix of an MR LRC if and only if every submatrix of with at most columns per block is invertible.
One can have an equivalent definition via paritycheck matrix.
Definition 2.
Let be a prime power and let be positive integers satisfying . Put and . An ary linear code with a paritycheck matrix of the form
(7) 
is called an MR LRC if

each has size and each has size ;

each generates an MDS code for (note that the nullspace of is code);

every columns consisting of any columns in each group and other arbitrary columns are linearly independent.
Remark 1.
2.2. Moore determinant
Let be a power of . For elements , the Moore matrix is defined by
The determinant is given by the following formula
where
runs through all nonzero direction vectors in
. Thus, if and only if are linearly independent.3. Explicit constructions via rational function fields
In this section, we only introduce constructions of MR LRCs from rational function fields. Our description will be selfcontained and elementary in terms of polynomials and we won’t be requiring any background on algebraic function fields (we have therefore deferred the background on function fields to Section 4 ahead of our more general construction in the next section).
3.1. Constructions via generator matrix
In this subsection, we present constructions of MR LRCs using Definition 1, i.e., via generator matrices of MR LRCs.
Let denote the number of monic irreducible polynomials of degree over . Then one has for any (see [17, Corollary 3.21 of Chapter 3]). This gives . For each monic irreducible polynomial of degree with , we get a polynomial of degree . Thus, for any , there are polynomials of degree such that for all
Assume that (i) ; or (ii) and there is a ary linear code, i.e. there exists a subset of of size such that any elements in this subset are linearly independent.
Choose polynomials of degree such that for all . Then for each , we can form an vector of dimension . Under our condition on , one can find functions such that any polynomials out of are linearly independent. Choose an irreducible polynomial such that is coprime with every for . For a function , we use to denote the residue class of in the residue class field .
Lemma 3.1.
Let be a subset with . If for some functions , then for all .
Proof.
Write for some polynomials with . The equality implies that is divisible by . As , we must have that is the zero polynomial. Suppose that for some , then we have
The left hand side of the above equality is divisible by , while the right hand side of the above equality is not divisible by . This contradiction completes the proof. ∎
Let be an irreducible polynomial in of degree
Define the matrix as follows.
(8) 
Lemma 3.2.
Assume that or there is a ary linear code. Let be the matrix given in (8). Put and . Then the ary code with the generator matrix is an MR LRC.
Proof.
Let be a submatrix of with at most columns per block . By Lemma 2.1, it is sufficient to show that is invertible. It follows from Subsection 2.2 that this is equivalent to showing that the first row of is linearly independent.
Let be a subset of for such that the first row of is . Then and . Let be a subset of such that if and only if . Then and hence . Let such that
Since , it follows from Lemma 3.1 that the function for each . As are linearly independent, we get for all . This completes the proof. ∎
By taking , we obtain the following result.
Theorem 3.3.
If , then there exists an MR LRC of dimension over a field of size
Proof.
If , put . If , put . Consider the rational function field . To have pairwise coprime polynomials of degree , it is sufficient to satisfy the inequality , i.e., which is the given condition. Now the desired result follows from Lemma 3.2. ∎
By considering binary BCH codes, we obtain the following binary codes.
Lemma 3.4.
There exists a binary linear code with .
Proof.
Put . Then we have a binary extended BCH code for any .
Puncturing positions, one gets a binary linear code. ∎
Combining the binary BCH codes of Lemma 3.4 with Lemma 3.2 applied with rational function field yields the following theorem.
Theorem 3.5.
If , then there exists an MR LRC of dimension over a field of size
Proof.
Consider the rational function field and a binary linear code with . To have pairwise coprime polynomials of degree , it is sufficient to satisfy the inequality . Under the condition that , this inequality is satisfied. Now the desired result follows from Lemma 3.2. ∎
3.2. Constructions via paritycheck matrix
To construct paritycheck matrices of MR LRCs, we only need to construct matrices given in (7). As we will see, the idea of constructing matrices is quite similar to that of constructing matrices in the previous subsection. Our goal is to prove the following theorem.
Theorem 3.6.
Let be positive integers with . Suppose that is a prime power satisfying and there is a ary linear code. If (i) ; or (ii) and there exists a ary linear code, then there exists an MR LRC with over a field of size .
Proof.
We can choose polynomials of degree such that for all . Then for each , we can form an vector space
of dimension . Under our assumption about , one can find functions such that any polynomials out of are linearly independent.
Choose an irreducible polynomial of degree and define the matrix
(9) 
Since , we can pick to be a generator matrix of an MDS code for . Let be the matrix given in (9). Then, we will prove that code with the matrix defined in (7) is an MR LRC over a field of size
which will complete the proof of Theorem 3.6.
To this end, it is sufficient to prove that the condition (iii) in Definition 2 is satisfied. Let be a subset of with for . Let be a subset of for such that . Put and let be the th column of the block in , i.e., . To prove the condition (iii) in Definition 2, it is equivalent to proving that the determinant is nonzero for all possible and given above.
Put and . Denote by and the submatrices and of consisting columns indexed by and , respectively. Then we have
As is invertible, the product
is equal to
This implies that is nonzero if and only if the matrix
(10) 
is invertible. Note that the matrix in (10) is a Moore matrix with the first row:
(11) 
for some . By the property of the Moore determinant, proving the condition (iii) in Definition 2 is equivalent to showing that the elements in (11) are linearly independent.
Let be a subset of such that if and only if . Then and hence .
Let such that
, i.e.,
By Lemma 3.1, for each . As are linearly independent, we get for all . This completes the proof. ∎
We now instantiate Theorem 3.6 with suitable choices of parameters to deduce the promises parts (i)(iv) of Theorem 1.1.
3.2.1. The case where
Let be integers. Then there is a ary MDS code for any prime power . Rewriting Theorem 3.6 for gives the following lemma.
Lemma 3.7.
Suppose that . If (i) ; or (ii) and there exists a ary linear code, then there exists an MR LRC over a field of size .
To apply Lemma 3.7, we need to find suitable codes and function fields as well. By taking the rational function field and applying BCH code given in Lemma 3.4, we obtain the following result.
Theorem 3.8.
If , then there exists an MR LRC over a field of size
Proof.
Theorem 3.9.
There exists an MR LRC over a field of size
Proof.
Consider the rational function field . Put . Then
Comments
There are no comments yet.