Alphabet-Dependent Bounds for Linear Locally Repairable Codes Based on Residual Codes

10/19/2018 ∙ by Matthias Grezet, et al. ∙ aalto 0

Locally repairable codes (LRC) have gained significant interest for the design of large distributed storage systems as they allow any small number of erased nodes to be recovered by accessing only a few others. Several works have thus been done to understand optimal rate-distance tradeoff and only recently the size of the alphabet has been taken into account. In this paper, a novel definition of locality is proposed to address the imprecision on the number of nodes contacted during the repair process. Then, a new alphabet-dependent bound is derived that applies to both definitions and shows better results than the currently known bounds. The bound is based on consecutive residual codes and intrinsically uses the Griesmer bound. Achievability results are also provided by considering the family of Simplex codes together with sporadic examples of optimal codes.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

In recent times, many service providers allow users to access and store data remotely to avoid overwhelming the limited storage capacity of single users. This leads naturally to the design of large distributed storage systems that reliably store data while minimizing the redundancy necessary to deal with server failures.

The use of erasure-correcting codes together with network coding techniques for distributed storage systems, initiated in [1], has become popular since these so-called regenerating codes achieve the optimal tradeoff between the required repair bandwidth and storage overhead. For a standard erasure code of length , dimension and minimum distance , any failures can be repaired by contacting at most other nodes. In addition to this property, and at the cost of the failure tolerance, regenerating codes also enable efficient repair of failed nodes. This was long thought to be in contrast to the traditional maximum distance separable (MDS) codes that have to reconstruct the whole file in order to repair a single node. However, [2, 3] showed that this claim is not true in general, namely some MDS codes can also be efficiently repaired. Nevertheless, the number of nodes contacted for repair is a bottleneck for the system efficiency. To reduce the repair network traffic, [4] and later [5] introduced the notion of locality allowing the repair of a single failure to be done by contacting only nodes with . Erasure codes satisfying this requirement are called locally repairable (or recoverable) codes.

A natural extension was presented in [6], [7] where the authors defined the locality for the information symbols to allow failures to be still corrected locally. This requirement can be extended to all symbols without differentiating between the information symbols and the parity symbols. In this paper, we will focus only on all-symbol locality and therefore drop the specification. Other extensions of the locality property include codes with availability [8], sequential repair of several erasures [9], cooperative repair [10], local repair on graphs [11] and many others.

Abundant literature has been devoted to understanding the best possible parameters of LRCs and provide optimal constructions. The authors of [4] gave the first trade-off between the parameters , and by showing that the minimum distance of an -LRC code with locality is bounded as follows:

(1)

This bound was extended in [6] for any -LRC code with locality :

(2)

Bounds for codes with availability were established in [8, 12, 13, 14].

The two bounds have been proven to be tight for large alphabet size with constructions provided in [5, 6, 7, 12, 15, 16, 17, 18, 19, 20, 21, 22]. For a summary on various bounds for LRCs, see [23].

The pioneering work done in [24] improves on the bound (1) by including a dependence on the alphabet size in the bound, that is, for any -LRC code over the alphabet with , we have

(3)

where is the maximal dimension of a code over of length and minimum distance . This has led to further construction of optimal LRCs over small alphabets for example in [25, 26].

Recently, the authors of [27] proposed the first alphabet-dependent bounds on -LRC codes over using an upper bound on the cardinality of the repair sets given their size and local minimum distance with the extra requirement that the upper bound is a log-convex function of the size. The global bound is as follows:

(4)

They also derived a linear-programming bound for LRCs with locality

under the extra assumption that the repair sets are disjoint.

Finally, in [28], the authors presented a Singleton-type bound for binary linear LRCs. This bound uses the local dimension of a repair set instead of the parameter and a more precise understanding of the intersection between two repair sets. As such, the work in this paper generalizes these two ideas.

While so far we made no distinction between non-linear and linear codes, the next results are only valid for linear codes. In [29], Griesmer proved the existence of a residual code for any binary linear code (over ), i.e., a code obtain by a restriction with certain specific parameters. He then derived a bound on the length of the code given the dimension and the minimum distance. The two results were later extended to an arbitrary field in [30]. We present here the last version. For any linear code over , there exists , a restriction of called the residual code of , such that has parameters . By recursively taking residual codes, the authors of [30] obtained the following bound on the length known as the Griesmer bound and denoted here by :

(5)

I-a Our contributions

In this paper, we first highlight the differences between the initial motivation for introducing the notion of locality in [4, 5] and the definition of locality given in [6], where the authors decided to constrain the size of the repair sets. We show, through some examples, how the definition is imprecise regarding the number of nodes contacted during the repair process when the alphabet size is fixed. To remedy this, we introduce a new definition for locality called dimension-locality and compare it to the first definition.

Then, we focus on linear LRCs and derive a new alphabet-dependent bound of the type of the bound (3) for linear codes with dimension-locality using the repair sets and chains of consecutive residual codes. Given the definition of the dimension-locality, this bound also applies to linear LRCs with locality by using a weaker bound on the dimension of the repair sets. We also obtain a new Singleton-type bound that reflects better the actual dimension of the repair sets as a corollary of our results. Furthermore, the new bound can be used to obtain the straightforward extension of the bound (3) for locality and the bound (2) which shows that our bound is always at least as good as these bounds.

Next, we derive the asymptotic formulas of the new bound and the new Singleton-type bound when to obtain the bounds on the tradeoff between the rate and the relative minimum distance. We also use these formulas to compare our bounds to the bounds in [27] (Eq.(4) and (12) here). We show that there are cases where the new asymptotic Singleton-type bound is better, equal or worse than the asymptotic version of the bound (2). The comparison with our main bound (8) is more direct as we can prove that there is always an interval in the relative minimum distance where the new bound is strictly better than the bounds in (2). Moreover, the improvement is quite significant since our bound benefits from the locality-unaware bounds on the rate-distance tradeoff. As an example, Figure 1 displays the comparison between the known bounds for linear LRCs with locality over the binary field and the new bound (13), where we use the McEliece-Rodemich-Rumsey-Welch bound as the intrinsic bound on the rate. Finally, we prove the achievability of the new bounds by studying the locality of Simplex codes and providing sporadic optimal examples.

Fig. 1: Comparison of the upper bounds on the tradeoff between the rate and the relative minimum distance from [6, 27, 31], and the bounds (10) and (13) over the binary field for large values of and fixed locality .

The rest of the paper is organized as follows. In Section II, we discuss the relation between the initial motivation that led to the introduction of locality and the definition given in [6]. Then, we define the notion of dimension-locality and compare it to the initial definition of locality. In Section III, we derive a new bound for linear LRCs with dimension-locality and extend it to linear LRCs with locality . We obtain also a new Singleton-type bound for these codes. In Section IV, we prove first that our bound is always as good as the straightforward extension of the bound (3) for locality and the bound (2). Then, we derive the asymptotic formulas for the new bounds and use them for the comparison to the bound (4). While the comparison with the new Singleton-type bound depends on the parameters of the codes, we prove that our bound always beats the bound (4) for large relative minimum distance. Finally, in Section V, we provide a family of codes that achieves our bounds by studying the locality of Simplex codes.

Ii Mathematical preliminaries and locality revisited

We denote the set by and the set of all subsets of by . The set of all positive integers including is denoted by . For a length-vector and a set , the vector denotes the restriction of the vector to the coordinates in the set . A linear code of length , dimension , and minimum distance is denoted by and a generator matrix for is where is a column vector for . The number of codewords in is the cardinality of , . The shortening of a code to the set of coordinates is defined by and the restriction of code to is defined by . For convenience, we call restricted codes the codes obtained by a restriction. For an linear code , if meets the Singleton bound, i.e., if , then is called a maximum distance separable (MDS) code.

To measure the dimension of restricted linear codes, or more generally, the amount of information contained in the restriction of an arbitrary codes, we use the notion of an entropy function on the subsets where is the length of the code. We state it here for quasi-uniform codes over the alphabet . Quasi-uniform codes are a general class of error-correcting codes, defined by the property that fibers of the projection have the same size. As such, the class of quasi-uniform codes contains all linear codes, group codes, and almost affine codes. We refer to [32] for more information about quasi-uniform codes and [33] for the entropy function on these codes.

Definition 1.

Let be a quasi-uniform code of length over the alphabet and . The entropy associated to is the function with

For ease of notation, if the underlying code of is clear, we drop the specification to . For linear codes, this function measures exactly the dimension of the restricted codes and for a subset , is equivalent to the rank of the sub-matrix formed by the columns with or the rank function of in the associated matroid of . As such, it has the following standard properties.

Proposition 1.

Let be a quasi-uniform code of length over the alphabet and the entropy function associated to . For , we have

  1. ,

  2. If then ,

  3. .

The entropy function also behaves nicely for restricted codes. Let and the restriction of to the set . Then for , we have .

Finally, we define a closure operation on the subsets of for linear codes.

Definition 2.

Let be an linear code and . The closure operator is

One can think of the closure operator via the generator matrix of where is the set of all columns in contained in the linear span of the columns indexed by .

The following table summaries the notations used throughout the paper. The formal definition of some of them will only appear later in the document.

Linear code of length , dimension and minimum distance
LRC code with locality and the local minimum distance
LRC code with dimension-locality
Griesmer bound on the length
Bound on the dimension
Log-convex bound on the dimension
Log-convex bound on the cardinality
Relative minimum distance
res-chain Chain of consecutive residual codes

Ii-a Definition of locality and relation with the number of nodes contacted for repairing

In this part, we explain how the definition of locality diverges from the initial motivation of introducing the notion of locality and state a new definition of locality called dimension-locality. As mentioned in the introduction, [4] and then [5] introduced the notion of locality to reduce the repair traffic by designing storage codes such that one failure can be repaired by contacting only a small amount of nodes in the storage system. A natural extension of the above definition is to allow multiple erasures to be corrected locally by still accessing a fewer number of nodes than . For this, we need the local sets of nodes to have a minimum distance of so that up to erasures can be repaired locally. The definition presented in [6] is the following.

Definition 3.

An linear code has all-symbol locality if for all code symbol there exists a set such that

  1. ,

  2. ,

  3. The minimum distance of the restriction of to the set is at least .

We refer to as an -LRC code.

With this definition, any coordinates of are determined by the values of the remaining coordinates, thus enabling local repairing by contacting at most other nodes. The problem with Definition 3 is that it implicitly requires the repair sets to be MDS codes in order for to be the dimension of the local sets. In other words, if is not an MDS code, then the number of nodes needed to repair any failures in is strictly less than . Thus, Definition 3 diverges from the initial meaning of controlling precisely the number of nodes contacted during the repair process when considering non-MDS repair sets.

This observation is particularly relevant when the field size is fixed and is too large for MDS codes to exist. For example, if we consider binary codes and require that to correct more than one erasure locally then none of the repair sets can be MDS and is no longer the local dimension. We illustrate this phenomenon in a concrete example.

Example 1.

Let be the binary linear -code given by the following generator matrix,

We define the three repair sets by their corresponding columns in : , and . Every repair sets has size , minimum distance , and dimension . Thus, we get and is a binary linear -LRC code. However, even if , we can repair up to two failures by contacting at most nodes.

To be able to precisely keep track of the number of nodes contacted during the repair process, we propose a slightly different definition for locally repairable codes tolerating multiple erasures locally where we replace the condition on the size by a condition on the dimension.

Definition 4.

An linear code has all-symbol dimension-locality if for all code symbol there exists a set such that

  1. ,

  2. ,

  3. The minimum distance of the restriction of to the set is at least .

We refer to as an -LRC code.

With this definition, we regain the fact that every coordinates can be recover by contacting at most other coordinates and can be made tight. When we do not restrict the field size, optimal repair sets will be MDS codes with size equal to and thus both definitions coincide. Definition 4 is also better for non-MDS repair sets when the field size is fixed as still measures the local dimension while represents partially the size and partially the dimension. On top of that, the new definition allows more flexibility on the size of the repair sets as it can be smaller or bigger than since and are only an upper bound and a lower bound respectively.

Obviously, every code with locality is a code with dimension-locality . We can replace be a bound on the dimension given the size and minimum distance to also take into account non-MDS repair sets. Let be the maximal dimension obtained by such a bound. Then, every code with locality is a code with dimension-locality . The problem is that might not be tight, i.e., there is no repair set such that which goes against the purpose of the new definition. This is illustrated in the following example.

Example 2.

Let be the binary linear -code obtained by the direct sum of two and an binary linear codes. If we choose the repair sets to be the three codes in the direct sum, we obtain . The maximal size is given by the last repair set which has size . Therefore, and is an -LRC code. The maximal dimension of a repair sets is so is an -LRC code as well. However, every bound on the dimension is at least since the Hamming code as parameters . Thus, it is impossible to obtain the true minimal dimension from the parameters of Definition 3.

The last example also demonstrates how we can use the flexibility on the size obtained from Definition 4 to keep the dimension-locality parameters intact while achieving a code of length that is not divisible by any of the sizes of the repair sets.

Iii Bounds for dimension-locality and locality

In this section, we study the structure of linear codes with dimension-locality and derive a bound on their parameters. Following the general framework of [24], we construct a set with a large size and a small dimension using a detailed analysis of the repair sets based on the work done in [29] and [30]. This yields to a bound of the form of the bound (3) handling both MDS and non-MDS repair sets. Then, we extend our bound to linear codes with locality

. Finally, using a weaker estimation of our results, we derive a new Singleton-type bound for

-LRC codes.

We start by presenting the new bound for codes with dimension-locality, where is an upper bound on the dimension and is the Griesmer bound on the size taking over .

Theorem 1.

Let be a linear -LRC code over . Then we have

(6)

where such that .

Proof.

Proof is given in the appendix. ∎

In order to prove this bound, we need a better understanding on the bound (3) and the implications of having a non-MDS repair set. The bound (3) relies mainly on two results. The first result is a construction of a set with an upper bound on the dimension and a lower bound on the size. The second result is a shortening argument that governs the part inside in the bound. The result is reproduced here with a slight rephrasing.

Lemma 1 ([24], Lemma 2).

Let be an linear code over and such that . Then the shortened code has parameters .

Regarding the first result, the technique used to construct large sets relies on taking the union of repair sets. If two repair sets happen to intersect, which will reduce both the entropy and the size of the union, a correction is performed by adding arbitrary elements to the union. The main difficulty to extend this technique to non-MDS repair sets is to deal with the intersection of the repair sets and find the appropriate correction. Indeed, the intersection of two repair sets can now have a size strictly larger than its entropy (take for example and in Example 1). Thus, it is not possible anymore to correct their union by an arbitrary set since it might exceed the upper bound on the entropy.

In order to correct the intersection, the main idea is to create a set using consecutive residual codes. As mentioned in the introduction, for any linear code over , there exists , a restriction of called the residual code of , such that has parameters . We define the sequence of consecutive residual codes as a chain of subsets of .

Definition 5.

Let be a linear code over . The res-chain of is a sequence of sets with constructed recursively by starting with and is such that is a residual code of .

This definition is well-defined since by the proof of Theorem in [30], the residual code of is constructed by restricting to a well-chosen set of coordinates. Therefore we can interpret the recursive residual code chain as a sequence of sets in . Furthermore, as the dimension of the residual code is one less than the dimension of the code, the chain has length and for all , there is a set in the res-chain of such that . Finally, by a recursive argument, if is a set in the res-chain of , then the minimum distance of is bounded from below by

We will now present two lemmas that are used to prove Theorem 1. The first lemma states how to correct a set when adding a repair set exceed the remaining entropy.

Lemma 2.

Let be a linear -LRC code over . Let be such that and an integer with . If there exists a repair set such that , then, there exists with such that

  • ,

  • .

Proof.

Proof is given in the appendix. ∎

Using the above lemma, we can prove the following second lemma that represents the challenging part of proving the new bound.

Lemma 3.

Let be a linear -LRC code over and such that and . Then there exists with such that

  • ,

  • .

Proof.

Proof is given in the appendix. ∎

While looking at the proofs, we can see that if all the repair sets are disjoint and have dimension , then Lemma 3 follows directly since no correction is needed. If the repair sets intersect each other or have dimension less than , we gain a little margin in the entropy of the union to perform a correction. We then use the chain of residual codes to get a set that, when added to the union, increases the entropy by exactly the amount left. The last trick is to evaluate both the size of the union and the set in the res-chain using the Griesmer bound. First, it is a bound on the size where the minimum distance plays a more important role compared to the dimension which fits the lower bound on the local minimum distance for codes with dimension-locality. Secondly, the Griesmer bound has the nice property that . The first term in the sum can be used to get a lower bound on the size of the repair set minus its intersection while the second term, under some conditions, gives a lower bound on the size of a particular set in the res-chain of a repair set. Thus, this relation is really useful when we add the extra set to correct the union. Finally, the Griesmer bound is also consistent with our construction based on residual codes.

As a corollary of Theorem 1, we can force the dimension to only be a multiple of to obtain a bound that resemble the original one.

Corollary 1.

Let be a linear -LRC code over . Then we have

(7)

Despite that the wider range of parameters makes the bound (6) theoretically better than this bound, the two show similar experimental results. More precisely, we randomly generated some feasible parameters for LRC codes over the binary field which yielded results showing that the bound (7) is equal to the bound (6). One possible justification is that for two consecutive dimensions and inside the minimum in (6), the length in the second term decreases by the largest value when . Therefore, the optimal condition on the global dimension would always happen when is a multiple of the local dimension . However, a formal proof is not possible due to the unknown intrinsic bound .

Iii-a New bounds for locality

As already explained in section II, to get a bound on LRCs with locality instead of dimension-locality , we can estimate via an upper bound on the maximal dimension given the size and the minimum distance . Let us call the dimension obtained via any upper bound. Since we never used that is actually tight, our previous results apply directly to codes with locality via the estimated dimension . Therefore, we get the following new bound.

Theorem 2.

Let be a linear -LRC code over and the upper bound on the local dimension. Then

(8)

where such that .

It is really important to estimate the size in the shortened part of the bound via the Griesmer bound instead of replacing it by . The reason is that is an upper bound on the size while we need something of the form of a lower bound. However, what we need is not exactly a lower bound since the dimension of a repair set can be lower than . We present a small counter-example.

Example 3.

Let be the binary linear code given by the following generator matrix

is a -LRC code with obvious repair sets. Estimating using the Griesmer bound yields . However, there is no sets of dimension less than and size greater than since every set of size has already a dimension equal to . The problem here is that the repair set of size , which gives the upper bound , has minimum distance strictly greater than .

Using Lemma 3, we can also derive a Singleton-type bound that will take into consideration when repair sets are not MDS.

Theorem 3.

Let be a linear -LRC code over and the upper bound on the local dimension. Then

(9)

where .

Proof.

Let be such that with . By Lemma 3 and the proof of Theorem 1, there is a set such that and . Then, the minimum distance is bounded by

Iv Analysis and comparisons

This section is devoted to the comparison between our bounds and the previously known bounds for LRC codes. In the first part, we show that the bound (8) leads to the straightforward extension of the bound (3) for locality and the bound (9) leads to the Singleton-type bound (2) when the field size is sufficiently large. In the second part, we derive the asymptotic formulas of the bounds (8) and (9) to obtain the bounds on the tradeoff between the rate and the relative minimum distance for fixed locality. This also enables a cleaner comparison between the new bounds and the bound (4) from [27]. Notice that we will not compare our bounds to the linear programming bound derived in [27] since it is not possible to derive an asymptotic formula from it and we do not assume that the repair sets are disjoint.

Our results show that the comparison between the new asymptotic Singleton-type bound and the asymptotic version of bound (4) depends on the performance of the Griesmer bound compared to the log-convex bounds as we can find some examples where the new bound is better, equal, or worse than the bound (4). By using the Plotkin bound as the intrinsic bound in (8), we prove that the bound (8) is always better than the bound (4) for large relative minimum distances.

We start by showing that the bound (8) leads to the extension of the bound (3) for locality .

Corollary 2.

Let be a linear -LRC code over . Then

(10)

To the best of our knowledge, although this extension of the original bound for locality is straightforward, it has not previously appeared in the literature.

Proof.

Let be the upper bound on the local dimension. We want to show that for all with , there is a set with and . For fixed, define such that . By the same arguments as in the proof of Theorem 1, there exists a set such that . It remains to show that . First we have since . Now is an integer so . Using the fact that the Griesmer bound is greater or equal than the Singleton bound, i.e., , we have

Hence using Lemma 1 with this approximation on the size, we obtain the desired bound on . ∎

Now we prove that the new Singleton-type bound can be used to obtain the bound (2).

Proposition 2.

For any linear -LRC code, the bound (9) is at least as strong as the bound (2).

Proof.

We rewrite the bound of Theorem 3 to have something closer to the form of the bound (2). First, we rewrite the Griesmer bound as

Let be such that . The bound of Theorem 3 can be transformed as follows:

By using the fact that and , we obtain

This shows that the bounds (8) and (9) are at least as good as the bounds (10) and (2) respectively. Furthermore, we can see that the bounds (8) and (9) improve on the previous bounds when or when . The latter case is of particular interest for small alphabets. For example, when considering binary LRC codes, the new bounds are already better than the bound (2) for all .

Iv-a Asymptotic regime

For the rest of this section, we look at the asymptotic regime where . Let be the rate of the code and the relative minimum distance. Usually, the relative minimum distance is denoted by but here we reserve for the local minimum distance. The goal is to obtain the bounds on the tradeoff between the rate and the relative minimum distance when the locality is fixed and . This also makes the comparison to the bound (4) easier.

We start with the Singleton-type bound (9). By dividing the bound (9) by and letting , its asymptotic formula is as follows :

(11)

Following the same method, we can derive the asymptotic version of the bound (4). For the ease of reading, we reproduce here the bound : For any -LRC over and a bound on the cardinality of a code which is log-convex in and such that , we have

Its asymptotic version is therefore :

(12)

The following table summaries the asymptotic formulas for the Singleton-type bounds with different locality assumptions. Notice that the last three are truly comparable since they share the same locality assumptions. When looking at the table, we can see how the locality assumption reduces the rate by the fraction of the local dimension over the local size.

Singleton bound
Gopalan et al. [4]
Prakash et al. [6]
Agarwal et al. [27]
Theorem 3

Following the method in [24], we can derive the asymptotic formula for the bound (8). Define . By dividing the bound (8) by , we obtain its asymptotic version

(13)

We can now compare the asymptotic formulas between (11), (13) and (12). Notice that for linear codes, is a bound on the dimension of the local sets. From now on, we denote by the bound . By definition gives a valid upper bound on the dimension in Theorem 2. However, it might happen that the best bound on the dimension is not log-convex and hence . In particular, the Griesmer bound on the cardinality is not a log-convex function on as demonstrated next.

Remember that a positive function of the integer argument is called log-convex if for any in the support of . For any linear code over , the Griesmer bound on given and , denoted by , is obtained by taking the maximal such that . Thus, it gives a bound on the cardinality, . Let us consider the parameters , and . Then, we obtain

Hence we have and the Griesmer bound on the cardinality is not log-convex on . Therefore there is no obvious answer to the comparison between the bounds (11) and (12) since we need to compare and and both the numerator and the denominator of the former are smaller or equal than their respective correspondents in the latter.

To be more specific, it mainly depends on the performance of the Griesmer bound compared to the log-convex bounds. For example if there exists a log-convex bound such that but , then the bound (12) is strictly better than the new Singleton-type bound. This is illustrated in Figure 2 which displays the rate-distance tradeoff for binary codes with locality . To evaluate the local dimension, we use the Hamming bound as a log-convex bound to get which is optimal. The Griesmer bound gives . Hence the green line representing the bound (12) is better than the orange line displaying the bound (11).

Fig. 2: Comparison of the asymptotic upper bounds from [6, 27], and the bounds (11) and (13) over the binary field with fixed locality .

On the other hand, if for all log-convex bound then (11) is strictly better than (12) because we have . Since it is impossible to give a proper example due to the fact that we would need to prove it for all log-convex bound, we restrict here the comparison by considering the three bounds proven to be log-convex in [27], namely the Singleton, Hamming and Plotkin bounds. Let be a linear LRC code with locality . The Singleton bound gives an upper bound on the local dimension of 12 and the Hamming bound gives a bound of 7. The Plotking bound is not applicable here since . Now, the Griesmer bound on the dimension gives an upper bound of and is therefore better than the Hamming bound. As we can see in Figure 3 displaying the asymptotic bounds for binary codes with locality , the bound (11) in orange is always better than the bound (12) in green.

Fig. 3: Comparison of the asymptotic upper bounds from [6, 27], and the bounds (11) and (13) over the binary field with fixed locality .

Finally, if and , the two bounds become the same. This happens for example in Figure 1 where the locality is and both the Griesmer and the Plotkin bound give and .

Nonetheless, these are just special cases of the comparison between the bounds (11) and (12) and the final comparison needs to be done case-specific since it depends on three parameters impossible to compute theoretically which are the best upper bound on the dimension , the best log-convex upper bound and the performance of the Griesmer bound regarding both and .

The comparison between the bound (12) and the bound (13) is more straightforward. Since the latter is at least as good as (11), we automatically get that the bound (13) is stronger than the bound (12) when the new Singleton-type is stronger or equal than the bound (12).

Furthermore, we will see that the bound (13) is always stronger than the bound (12) for large relative minimum distances, i.e., there is a threshold value such that for all relative distance , the bound (13) is better than the bound (12).

To prove this, we use the asymptotic Plotkin bound for given by

Combining it with the bound (13) and solving the optimization problem yields the following bound on the rate

(14)

We can now state formally our claim.

Proposition 3.

Let be a linear -LRC and be the bound on the local dimension given by the best log-convex bound . Assume that and let

Then the bound (14) is stronger than the bound (12) for all relative minimum distance .

Proof.

Proof is given in the appendix. ∎

The proof follows from the fact that the bounds (14) using and the bound (12) are two lines with different slopes and that the bound (14) becomes equal to when is larger than . Thus, the two lines intersect exactly in and the bound (14) is better than (12) for relative minimum distances strictly greater than .

Finally, any bound on the rate that improves on the asymptotic Plotkin bound will thus increase the size of the interval where the bound (14) is better than the bound (12). In particular, this is true for the rate-distance bound given in [31] which is the best known bound for binary code. The MRRW bound is as follows:

(15)

where is the binary entropy function.

We can thus replace the asymptotic Plotkin bound by the MRRW bound in 13 and, by numerically solving the optimization problem, we obtain the red curve in Figure 1, 2 and 3. We see that the bound (13) combining with the MRRW bound improves significantly on the bound (12) even when the Griesmer bound is not equal to the maximal local size.

V Achievability results

Several constructions of codes achieving the bound (2) already exist for example in [6, 16, 17, 21, 22]. Many of these constructions require an alphabet size to be exponential in the code length. Since the bound (8) approaches the bound (2) for large alphabets, the bound (8) is indeed tight. We show in this section by considering the family of Simplex codes that the bound (8) is also tight for some parameter values for every fixed field size, in particular small ones.

Definition 6.

Let and be an matrix over with non-zero pairwise independent columns. The code with as a generator matrix is called a -ary Simplex code with parameters .

Since Simplex codes are known to achieve the Griesmer bound, they achieve the bound (8) by taking and using the Griesmer bound for . Therefore, the locality parameters do not influence the optimality of the code. This is in fact true in general. If a code already achieves a bound on without locality constraints and has a certain locality, then it will be an optimal locally repairable code for these locality parameters.

Let us study the locality of the Simplex code . Simplex codes, as locally repairable codes with , were already considered in [24] and in [25] where the authors used them to construct new LRCs. Here, we want to derive the locality for larger dimensions and .

The first thing to notice is that for every coordinate , there exists a codeword in different from the zero codeword such that . Indeed, it is enough to take two different codewords not a multiple of each other and subtract them in an appropriate manner to obtain the desired codeword. Since every codeword of has the same weight, we can take the residual code associated to which is the Simplex code . Therefore, by recursion, for every , the coordinate is contained in a Simplex code obtained by a restriction of . Since the minimum distance of is , by letting , we assure that . Hence the Simplex code has dimension-locality for all . Finally, .

To get examples that achieve the bounds (8) and (9) in a less obvious manner, we will prove that all the examples presented in this paper are optimal. Let us start with Example 1 where the code is a binary -LRC with . By using the Plotkin bound, we get and thus . We can now compute the bound (9):

Hence, is optimal.

In Example 2, we presented a binary code with parameters and . Since the purpose of this example is to illustrate the fact that might be not equal to , it is necessary here to use the bound (6) instead of (8). Let . By using the Plotkin bound for , we have

Hence, this code reaches the bound (6).

Finally, the binary code in Example 3 has parameters and . By using the Plotkin bound, we get and . Let . We compute the bound (8) using the Plotkin bound for . We have

Hence, the code is an optimal LRC code.

Interestingly enough, to prove the optimality of both codes from Examples 2 and 3, we used a set of dimension in the bounds (6) and (8). However, none of the codes reaches the Singleton-type bound (9) obtained via a set of the same dimension. This is because we have an extra dependency on the field size by the bound . Indeed there is no binary code with parameters , while there is already an MDS code satisfying these parameters over .

Finding a good family of LRC codes achieving the bounds (6) or (8) is left for future work.

References

  • [1] A. Dimakis, P. B. Godfrey, Y. Wu, M. J. Wainwright, and K. Ramchandran, “Network coding for distributed storage systems,” IEEE Transactions on Information Theory, vol. 56, no. 9, pp. 4539–4551, 2010.
  • [2] V. Guruswami and M. Wootters, “Repairing Reed-Solomon codes,” IEEE Transactions on Information Theory, vol. 63, pp. 5684–5698, 2016.
  • [3] A. S. Rawat, I. Tamo, V. Guruswami, and K. Efremenko, “MDS code constructions with small sub-packetization and near-optimal repair bandwidth,” IEEE Transactions on Information Theory, vol. 64, pp. 6506–6525, 2017.
  • [4] P. Gopalan, C. Huang, H. Simitci, and S. Yekhanin, “On the locality of codeword symbols,” IEEE Transactions on Information Theory, vol. 58, no. 11, pp. 6925–6934, 2012.
  • [5] D. Papailiopoulos and A. Dimakis, “Locally repairable codes,” in International Symposium on Information Theory.   IEEE, 2012, pp. 2771–2775.
  • [6] N. Prakash, G. M. Kamath, V. Lalitha, and P. V. Kumar, “Optimal linear codes with a local-error-correction property,” in International Symposium on Information Theory.   IEEE, 2012, pp. 2776–2780.
  • [7] G. M. Kamath, N. Prakash, V. Lalitha, and P. V. Kumar, “Codes with local regeneration,” 2013 Information Theory and Applications Workshop (ITA), pp. 1–5, 2013.
  • [8] A. Wang and Z. Zhang, “Repair locality with multiple erasure tolerance,” IEEE Transactions on Information Theory, vol. 60, no. 11, pp. 6979–6987, 2014.
  • [9] N. Prakash, V. Lalitha, and P. V. Kumar, “Codes with locality for two erasures,” 2014 IEEE International Symposium on Information Theory, pp. 1962–1966, 2014.
  • [10] A. S. Rawat, A. Mazumdar, and S. Vishwanath, “Cooperative local repair in distributed storage,” EURASIP J. Adv. Sig. Proc., vol. 2015, p. 107, 2015.
  • [11] A. Mazumdar, “Storage capacity of repairable networks,” IEEE Transactions on Information Theory, vol. 61, pp. 5810–5821, 2015.
  • [12] A. S. Rawat, D. S. Papailiopoulos, A. G. Dimakis, and S. Vishwanath, “Locality and availability in distributed storage,” IEEE Transactions on Information Theory, vol. 62, no. 8, p. 4481–4493, Feb 2016.
  • [13] I. Tamo, A. Barg, and A. Frolov, “Bounds on the parameters of locally recoverable codes,” IEEE Transactions on Information Theory, vol. 62, no. 6, pp. 3070–3083, 2016.
  • [14] P. Huang, E. Yaakobi, H. Uchikawa, and P. H. Siegel, “Binary linear locally repairable codes,” IEEE Transactions on Information Theory, vol. 62, pp. 5296–5315, 2016.
  • [15] C. Huang, M. Chen, and J. Lin, “Pyramid codes: Flexible schemes to trade space for access efficiency in reliable data storage systems,” in International Symposium on Network Computation and Applications.   IEEE, 2007, pp. 79–86.
  • [16] G. M. Kamath, N. Prakash, V. Lalitha, P. V. Kumar, N. Silberstein, A. S. Rawat, O. O. Koyluoglu, and S. Vishwanath, “Explicit MBR all-symbol locality codes,” 2013 IEEE International Symposium on Information Theory, pp. 504–508, 2013.
  • [17] A. S. Rawat, O. O. Koyluoglu, N. Silberstein, and S. Vishwanath, “Optimal locally repairable and secure codes for distributed storage systems,” IEEE Transactions on Information Theory, vol. 60, pp. 212–236, 2014.
  • [18] I. Tamo, D. Papailiopoulos, and A. Dimakis, “Optimal locally repairable codes and connections to matroid theory,” IEEE Transactions on Information Theory, vol. 62, pp. 6661–6671, 2016.
  • [19] I. Tamo and A. Barg, “A family of optimal locally recoverable codes,” IEEE Transactions on Information Theory, vol. 60, no. 8, pp. 4661–4676, 2014.
  • [20] S. Goparaju and A. R. Calderbank, “Binary cyclic codes that are locally repairable,” 2014 IEEE International Symposium on Information Theory, pp. 676–680, 2014.
  • [21] T. Ernvall, T. Westerbäck, R. Freij-Hollanti, and C. Hollanti, “Constructions and properties of linear locally repairable codes,” IEEE Transactions on Information Theory, vol. 62, pp. 5296–5315, 2016.
  • [22] T. Westerbäck, R. Freij-Hollanti, T. Ernvall, and C. Hollanti, “On the combinatorics of locally repairable codes via matroid theory,” IEEE Transactions on Information Theory, vol. 62, pp. 5296–5315, 2016.
  • [23] R. Freij-Hollanti, C. Hollanti, and T. Westerbäck, “Matroid theory and storage codes: bounds and constructions,” 2017, arXiv: 1704.0400.
  • [24] V. Cadambe and A. Mazumdar, “An upper bound on the size of locally recoverable codes,” in International Symposium on Network Coding, 2013, pp. 1–5.
  • [25] N. Silberstein and A. Zeh, “Anticode-based locally repairable codes with high availability,” Des. Codes Cryptography, vol. 86, pp. 419–445, 2018.
  • [26] A. Zeh and E. Yaakobi, “Optimal linear and cyclic locally repairable codes over small fields,” 2015 IEEE Information Theory Workshop (ITW), pp. 1–5, 2015.
  • [27] A. Agarwal, A. Barg, S. Hu, A. Mazumdar, and I. Tamo, “Combinatorial alphabet-dependent bounds for locally recoverable codes,” IEEE Transactions on Information Theory, vol. 64, pp. 3481–3492, 2018.
  • [28] M. Grezet, R. Freij-Hollanti, T. Westerbäck, and C. Hollanti, “Bounds on binary locally repairable codes tolerating multiple erasures,” in The International Zurich Seminar on Information and Communication (IZS 2018) Proceedings.   ETH Zürich, 2018, pp. 103–107.
  • [29] J. H. Griesmer, “A bound for error-correcting codes,” IBM Journal of Research and Development, vol. 4, pp. 532–542, 1960.
  • [30] G. Solomon and J. J. Stiffler, “Algebraically punctured cyclic codes,” Information and Control, vol. 8, pp. 170–179, 1965.
  • [31] R. J. McEliece, E. R. Rodemich, H. Rumsey, and L. R. Welch, “New upper bounds on the rate of a code via the Delsarte-MacWilliams inequalities,” IEEE Trans. Information Theory, vol. 23, pp. 157–166, 1977.
  • [32] T. Chan, A. J. Grant, and T. Britz, “Properties of quasi-uniform codes,” 2010 IEEE International Symposium on Information Theory, pp. 1153–1157, 2010.
  • [33] T. Westerbäck, M. Grezet, R. Freij-Hollanti, and C. Hollanti, “On the polymatroidal structure of quasi-uniform codes with applications to heterogeneous distributed storage,” in International Symposium on Mathematical Theory of Networks and Systems, 2018.