Large-scale cloud storage and distributed file systems, such as Amazon Elastic Block Store (EBS) and Google File System (GoogleFS), have reached such a massive scale that disk failures are the norm and not the exception. In those systems, to protect the data from disk failures, the simplest solution is a straightforward replication of data packets across different disks. However, this solution suffers from a large storage overhead. As an alternative solution, MDS codes are used as storage codes, which encode information symbols to symbols and store them across disks in. Using MDS codes leads to dramatic improvement in redundancy compared with replication. However, for MDS codes, when one node fails, the system recovers it at the cost of contacting surviving symbols, thus complicating the repair process.
To improve the repair efficiently, in , locally repairable codes were introduced to reduce the number of symbols contacted during the repair process of a failed node. More precisely, locally repairable codes ensure that a failed symbol can be recovered by accessing only other symbols .
The original concept of locality only works when exactly one erasure occurs (that is, one node fails). Over the past few years, several generalizations have been suggested for the definition of locality. As examples we mention locality with a single repair set tolerating multiple erasures , locality with disjoint multiple repairable sets [31, 25, 27, 5], and hierarchical locality .
In this paper, we focus on locally repairable codes with a single repair set that can repair multiple erasures locally . By ensuring redundancies in each repair set, this kind of locally repairable codes guarantees the system can recover from erasures by accessing surviving code symbols for each erasure. This is denoted as -locality.
Research on codes with -locality has proceeded along two main tracks. In the first track, upper bounds on the minimum Hamming distance and the code length have been studied. Singleton-type bounds were introduced for codes with -locality in [23, 28, 32]. In , a bound depending on the size of the alphabet was derived for the Hamming distance of codes with
-locality. Via linear programming, another bound related with the size of the alphabet was introduced in. Very recently, in , an interesting connection between the length of optimal linear codes with -locality and the size of the alphabet was derived.
In the second research track, constructions for optimal locally repairable codes have been studied. In , a construction of optimal locally repairable codes was introduced based on Gabidulin codes. In , a construction of optimal locally repairable codes with was proposed. By analyzing their structure, optimal locally repairable codes were also constructed in . In  and , optimal locally repairable codes were constructed using matroid theory. The construction of  was generalized in  to include more flexible parameters when . Very recently, in , cyclic optimal locally repairable codes with unbounded length were constructed for Hamming distance . Finally, for the case of Hamming distance , [10, 13, 3] presented constructions of locally repairable codes that have optimal distance as well as order-optimal length .
The main contribution of this paper is the study of optimal linear codes with -locality and length that is super-linear in the field size. We analyze the structure of optimal locally repairable codes and as a result, we prove that the bound in  holds for some other cases besides the one mentioned in . We then derive a new upper bound on the length of optimal locally repairable codes for the case of . Finally, we give a general construction of locally repairable codes with length that is super-linear in the field size. Based on some special structures such as packings and Steiner systems, locally repairable codes with optimal Hamming distances and order-optimal length with respect to the new bound are obtained. This is to say, the bound for is also asymptotically tight for some special cases.
The remainder of this paper is organized as follows. Section II introduces some preliminaries about locally repairable codes. Section III establishes an upper bound for the length of optimal locally repairable codes for the case . Section IV presents a construction of optimal locally repairable codes with length . Section V concludes this paper with some remarks.
We present the notation and basic definitions used throughout the paper. For a positive integer , we define . For any prime power , let denote the finite field with elements. An linear code over is a -dimensional subspace of with a generator matrix , where
is a column vector of dimensionfor all . Specifically, it is called an linear code if the minimum Hamming distance is . For a subset , let denote the cardinality of , let denote the set of all subsets of , and define
In , Huang et al. introduce the following definition for the locality of code symbols. The th code symbol of an linear code is said to have locality , if it can be recovered by accessing at most other symbols in . More precisely, symbol locality can also be defined in mathematics as follows.
Definition 1 ():
For any column of with , define as the smallest integer such that there exists a -subset satisfying
Equivalently, for any codeword , the th component
Define for any set . Then, an linear code is said to have information locality if there exists with satisfying Furthermore, an linear code is said to have all symbol locality r if .
To guarantee that the system can locally recover from multiple erasures, say, erasures, the definition of locality was generalized in  as follows.
Definition 2 ():
The th column , , of a generator matrix of an linear code is said to have -locality if there exists a subset such that:
and ; and
the minimum Hamming distance of the punctured code obtained by deleting the code symbols () is at least ,
where the set is also called a repair set of . The code is said to have information -locality if there exists with such that for each , has -locality. Furthermore, the code is said to have all symbol -locality if all the code symbols have -locality.
Lemma 1 ():
For an linear code with information -locality,
Additionally, a locally repairable code is said to be optimal if its minimum Hamming distance attains this bound with equality.
Iii Bounds on the Length of Locally Repairable Codes
The goal of this section is to derive upper bounds on the length of optimal locally repairable codes. Throughout this section, let
where , , and are all integers.
For the bounds and the construction we shall require a simple combinatorial covering design which we now define.
Let . Also, let be a set of cardinality , whose elements are called points. Finally, let be a set of blocks such that , and for all , and . We then say is an -essential covering family (ECF). If all blocks are the same size we say is a uniform -ECF.
An important quantity associated with any family of subsets, , is its overlap, denoted , and defined as
Obviously . Additionally, if and only if its sets are pairwise disjoint.
Let be some finite set. Then for any , and any , there exists a subset , , such that
We partition into two subsets, and , where
For convenience, denote and where .
Obviously, if , then is the desired -subset satisfying
For the case , we claim that we can select a -subset containing different pairs of sets for with
Otherwise, there exists a subset with size at most such that for any ,
However, that means that every has an empty intersection with any other set in , which contradicts the definition of . Thus, there exists a , , and with
which completes the proof. ∎
Let be an -ECF, and assume it is non-uniform or that . Then for every , there exists a subset , , such that
We first construct a uniform from , by arbitrarily adding elements to sets in that contain less than elements. Note that is not necessarily an ECF. Obviously . We contend now that . If this is immediate, since we have . If the is not uniform, at least one set has , and adding elements to it in the process of creating necessarily increases the overlap, i.e., . We also observe that,
By Lemma 2, for any given , there exists a -subset, say , such that
where the last inequality holds since and .
If was created from , i.e., , then by (6) we have,
Now set to complete the proof. ∎
We can now start working on the bounds.
For any linear code with all symbol -locality, let be the set of all possible repair sets. Then we can find a subset such that is an -ECF.
By Definition, contains at least one repair set for each code symbol, hence
If for each , , then set and the lemma follows. Otherwise, let be such that . Thus, by (7), we conclude that
Set . Since , and also satisfies (8), we can repeat the elimination procedure to obtain the desired set . ∎
Based on Lemma 4, in what follows, instead of considering the set of all possible repair sets, we shall use the ECF .
Let be an linear code with all symbol -locality. Let be the ECF given by Lemma 4. If for a subset , and for all ,
then we have
Denote and . For each , (9) means that there exists a -subset such that . Thus, we can get pairwise disjoint subsets .
By Definition 2, for . Therefore, we have
We note that when , (9) is always satisfied by the ECF . We now continue with our exploration of the properties of .
Let be an linear code with all symbol -locality. Let be the ECF given by Lemma 4. If there are subsets with , , and
then we can obtain a -subset such that
If , then the lemma follows by setting . Otherwise, we have . Since every is an repair set, . This means that . Note that by the lemma requirements, , which implies that there exists a such that . We recall, however, that since is an repair set, if , , then . It follows that cannot have a large intersection with , namely,
Hence, there exists a with . Again, using the fact that is an repair set and , we have , and therefore,
where the last inequality holds by the fact and (12). Therefore, repeating the above operations, we can extend to be a -subset such that
Let be an linear code with all symbol -locality. Let be the ECF given by Lemma 4. Assume such that . If there exists a such that
then there exists with and
Assume satisfies (13). Let be the smallest subset for which (13) holds, i.e., there exists a set with , which in turn implies that . By the minimality of , the set satisfies the requirements of Lemma 5, which implies
As noted before, , and since trivially , we also necessarily have . Therefore, by Lemma 6, we can extend to a -subset such that
Considering the set , we have
where the last inequality holds due to the fact that by the properties of the ECF .
we can find a set with by taking and adding arbitrary coordinates until reaching the desired rank. This set has size
which follows from (14). ∎
Let be an linear code with all symbol -locality. Let be the ECF given by Lemma 4. Assume such that . If is an integer such that
and , then there exists a set with and
If the requirements of Lemma 7 hold for , then the desired may be obtained by Lemma 7, and we are done. Otherwise, does not satisfies the requirements of Lemma 7, and then using Lemmas 5 and 6 (setting in the latter), may be extended to a set with elements satisfying
Recall that , with . It now follows that
where follows from the fact that for all , and follows from (15).
For the case , means that , i.e., . Thus, by (17) and ,
for both and .
Having collected sufficient insight into the structure of repair sets, we are now at a position to state and prove the first main tool in proving our bounds.
Let be an optimal linear code with all symbol -locality, where optimality is with respect to the bound in Lemma 1. Let be the set of all possible repair sets. Write , for integers and , and . If , , and additionally, if or , then there exists a set of repair sets , such that all are of cardinality , and is a partition of .
Let be the ECF obtained in Lemma 4. If and for all , then set the theorem follows.
Otherwise, we have or for some . We distinguish between two cases. First, assume . According to Lemma 3 we can find a -subset satisfying
Since or , we have . Therefore, by Lemma 8, there is a set with and
Thus, we can take a non-trivial linear combination of the rows of the generator matrix that results in a non-zero codeword which has zeroes in the positions of , hence,
This is a contradiction to the optimality of with respect to the bound in Lemma 1.
In the second case, . We note that we only need to consider the case , namely, , since if then the condition implies that . We therefore assume . if or for some then we can find two distinct repair sets such that or . In either case, we have .
We again distinguish between two cases depending on . For the first case, if then we have . In the second case, when , assume without loss of generality, that , then
We now construct a set by arbitrarily adding coordinates to such that . Therefore, , or equivalently, . Like before, we get
which is again a contradiction with the optimality of with respect to the bound in Lemma 1. ∎
We take a short break to consider the special case of . This special case was studied in  and an upper bound on the length of optimal codes was obtained. While we obtain the exact same bound as , our bound is an improvement since it has more relaxed conditions. In particular, the bound of  requires whereas we require . We now provide the exact claim:
Let be an optimal code with all symbol locality . If , , , and additionally, or , then
Let be an optimal linear code with all symbol -locality, where optimality is with respect to the bound in Lemma 1. If , , and additionally or , then there are pairwise-disjoint repair sets, , such that for all ,