# Computing sharp recovery structures for Locally Recoverable codes

A locally recoverable code is an error-correcting code such that any erasure in a single coordinate of a codeword can be recovered from a small subset of other coordinates. In this article we develop an algorithm that computes a recovery structure as concise posible for an arbitrary linear code C and a recovery method that realizes it. This algorithm also provides the locality and the dual distance of C. Complexity issues are studied as well. Several examples are included.

There are no comments yet.

## Authors

• 4 publications
• 14 publications
• 5 publications
12/03/2018

### Locally Recoverable codes with local error detection

A locally recoverable code is an error-correcting code such that any era...
06/07/2018

### Locally Recoverable codes from algebraic curves with separated variables

A Locally Recoverable code is an error-correcting code such that any era...
07/28/2021

### A family of codes with variable locality and availability

In this work we present a class of locally recoverable codes, i.e. codes...
01/19/2022

### Dual-Code Bounds on Multiple Concurrent (Local) Data Recovery

We are concerned with linear redundancy storage schemes regarding their ...
10/29/2019

### Locally recoverable codes on surfaces

A linear error correcting code is a subspace of a finite dimensional spa...
11/08/2021

### Locally Testable Codes with constant rate, distance, and locality

A locally testable code (LTC) is an error-correcting code that has a pro...
01/27/2022

### c^3-Locally Testable Codes from Lossless Expanders

A locally testable code (LTC) is an error correcting code with a propert...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

Locally recoverable codes were introduced in [7], motivated by the use of coding theory techniques applied to distributed and cloud storage systems, in which the information is distributed over several nodes. The growth of the amount of stored data make the loss of information due to node failures a major problem. To obtain a reliable storage, when a node fails we want to be able to recover the data it contains by using information from the other nodes. This is called the repair problem. A naive method to solve it, is to replicate the same information in different nodes. A more clever strategy is to protect the data by using error-correcting codes, [13, 15]. As typical examples of this last solution, we can mention Google and Facebook, that use Reed-Solomon (RS) codes in their storage systems. The procedure is as follows: the information to be stored is a long sequence of symbols, which are elements of a finite field . This sequence is cut into blocks, , of the same length, say . According to the isomorphism , each of these blocks can be seen as an element of the finite field , . Fix an integer

. The vector

is encoded by using a RS code of dimension over , whose length , , is equal to the number of nodes that will be used in its storage. Then we choose , and send to the

-th node. When a node fails, we may recover the data it stores by using Lagrangian interpolation from the information of any other

available nodes.

The above solution to the repair problem is not optimal. When is small with respect to , then the transmission rate obtained by our encoding method is poor. For large values of the scheme is wasteful since symbols must be used to repair just one. Thus it is natural to wonder if there exists other codes allowing the repair of lost encoded data more efficiently than RS codes, that is by making use of smaller amount of information.

Roughly speaking we can set the repair problem in terms of coding theory as follows: Let be a linear code of length , dimension and minimum distance over the field . A coordinate is locally recoverable with locality if there is a recovery set with and , such that for any codeword , an erasure in position of can be recovered by using the information given by the coordinates of with indices in . A collection of recovery sets for all coordinates is a recovery structure. The code is locally recoverable (LRC) with locality if there exists a recovery structure of locality , that is to say, if any coordinate is locally recoverable with locality at most . The locality of , is the smallest verifying this condition. For example, it is not difficult to prove that MDS codes of dimension have locality .

Every code with minimum distance is locally recoverable with locality . In practice we are interested in LRC’s admitting recovery sets as small as possible, in relation to the other parameters of . Thus the locality has become a fundamental parameter of a code when it is used for local recovery purposes. Unfortunately the explicit computation of recovery structures, and even the computation of the locality of a specific code, have been revealed as difficult problems. As regards the latter, there exist some known bounds on it. Perhaps the most interesting of them is the following Singleton-like bound: the locality of verifies the relation, [7],

 (1) k+d+⌈kloc(C)⌉≤n+2

which gives a lower bound on . However it is known that this bound is not sharp, see Example 1 below. Codes reaching equality in (1) are called optimal.

Much research has been devoted in recent years to the repair problem and many recovery structures are known for different types of codes. See for instance [4, 9, 16, 17] to be aware of the variety of methods used to that purpose. Nevertheless for most of the recovery structures currently available in the literature, it is unknown whether or not they can be refined to obtain other simpler ones.

In this article we develop an algorithm that computes a recovery structure as concise as posible for an arbitrary code . This algorithm also gives the locality and the dual distance of . The article is organized as follows: in Section 2 we summarize all necessary facts about LRC codes and recovery structures that we shall need in the rest. The algorithm is developped in Section 3, where complexity issues are treated as well. Finally, in Section 4 we present some experimental results and running times for several examples of codes.

## 2. Recovery Structures

In this section we give some formal definitions and facts that will be used in the rest of this article.

Let be a linear code of length over . Let be a generator matrix of and be its columns. A set with , is a recovery set for coordinate if is a linear combination of , see [7]. Note that, this is equivalent to say that , where is the projection of on the coordinates in (see Proposition 1 below). Thus the notion of recovery set does not depend on the chosen generator matrix. As we shall prove later, if is a recovery set for , then for every codeword , the coordinate can be obtained from the other coordinates with indices .

By an elementary recovery structure for we mean a family , such that for all , is a recovery set for the coordinate . Thus a recovery structure allows us to recover an erasure at any position in a codeword. The structure is called minimal if so is each , that is, if no proper subset of is a recovery set for . A general recovery structure for is the union of elementary structures, that is to say a collection of recovery sets for each coordinate. From now on, all structures considered in this article will be elementary.

The code is locally recoverable (LRC) if it admits a recovery structure . In such case, the number is the locality of with respect to coordinate , . The locality of is . As different recovery structures are possible for the same code, it is natural to ask if given one of them, , there exits another, , with smaller recovery sets. So we define , , and . If for all , then will be called sharp. Clearly is sharp if and only if for any structure of , so the locality of a code is always reached through a sharp structure.

###### Remark 1.

A structure of is optimal if reaches equality in the following bound

 (2) k+d+⌈kloc(R)⌉≤n+2

which is derived from (1). Note that optimality is not enough to ensure sharpness. For instance, in [16, Example 1] the authors show a LRC code with a recovery structure formed by sets of cardinality 2. Then is optimal. Let be the structure whose sets are obtained by adding a (random) coordinate to the recovery sets of . Thus and so is optimal as well, but not sharp. This example also shows that the locality of a code can not, in general, be obtained from (1), not even when an optimal structure is available.

Let be the minimum distance of . If then, up to reordering, contains the codeword , so the first coordinate can not have any recovery set and is not a LRC. At the other end, if there exists a coordinate such that for all codeword (that is, if is a degenerate code), it is not necessary to recover this coordinate from the others. So in all that follows we will assume that is a nondegenerate code of minimum distance . Let us investigate in a little more in detail the recovering properties for these codes. As a notation, will be the dual of . The support of a vector is the set and its weight is .

###### Proposition 1.

Let be a code of length and let . The following statements are equivalent.
(i) is a recovery set for coordinate ;
(ii) ;
(iii) there exists a codeword such that .

###### Proof.

The equivalence between (i) and (ii) follows from the fact that a generator matrix of can be obtained from the submatrix of given by the columns with indices in , and so . Let . The equivalence between (i) and (iii) follows from the fact that if and only if , that is if and only if . ∎

Given a word , for any codeword we have that , where stands for the usual inner product in , . So, if , then can be obtained from as

 (3) xi=−w−1i(w⋅x).

Thus such provides a recovery set and a recovery method for coordinate .

###### Corollary 1.

Let , . Then is a recovery set for all .

A codeword is called minimal if for every , , such that we have .

Given a coordinate , the codeword is said to be -minimal if and for every , such that we have .

###### Lemma 1.

Let . The following conditions are equivalent.
(i) is minimal;
(ii) is -minimal for all ;
(iii) is -minimal for some .

###### Proof.

(i)(ii)(iii) are clear. Let us prove (iii)(i). Let be -minimal for some . If were not minimal, it would exist , , such that . Furthermore , since is -minimal. Let and let . Then and , which contradicts that is -minimal ∎

###### Corollary 2.

Let be a code of length and let . The following statements are equivalent.
(i) is a minimal recovery set for coordinate ;
(ii) there exists a minimal codeword such that .

###### Proof.

(i)(ii): Let be the codeword ensured by item (iii) of Proposition 1. If either were not -minimal or , then it would exists such that . According to Corollary 1, is a recovery set for , which contradicts that is minimal. Lemma 1 ensures that is a minimal word of . (i)(ii): is a recovery set for , again according to Corollary 1. If were not minimal, then neither would be -minimal, according to Proposition 1. ∎

So a recovery structure for is equivalent to a family of words of such that : just take . Besides, as seen before, the equalities , , provide a method to compute any erased coordinate in a word . is minimal iff so is each ; and is sharp iff is minimal among all families of words of such that .

###### Corollary 3.

Let be a code and let be its dual. Then .

The bound on given by the previous corollary improves sometimes the bound given by (1), which shows again that the latter is not necessarily sharp.

###### Example 1.

Let us consider a code over with . According to the Griesmer bound, its minimum distance verifies that . Indeed such a code with exists. Consider, for example, the one given by the generator matrix

 G=⎛⎜ ⎜ ⎜⎝1000α1+αα1+α101001+αα11+α000100α1+α11+α0001α1α01+α⎞⎟ ⎟ ⎟⎠∈F4×94.

is obtained from the extended Quadratic Residue code by shortenning twice and then puncturing; its minimum distance is , see [12]. From (1) its locality verifies . But being an almost MDS code, its dual must be a code, see [6, Theorem 7]. Thus . In fact, as we shall compute later in Example 5, we have .

###### Remark 2.

There exists a remarkable connection between the repair problem and the theory of secret sharing. A secret sharing scheme (SSS) is a method for distributing a secret among a set of participants, each of whom receives a piece of the secret or share. The scheme is designed in such a way that only the authorized coalitions of participants can recover the secret, by pooling the shares of its members.

SSS can be obtained from codes in several ways. The first method to do that was given in [3]. Let be a code with generator matrix . A dealer computes the codeword from a random vector subject to , the secret to be shared. The share of the -th participant is , . Thus, a coalition of participants is authorized if and only if is a recovery set for coordinate . The locality is exactly the smaller size of an authorized coalition. Massey [11] later introduced the concept of minimal codeword and observed that the minimal authorized coalitions correspond to the minimal words of whose support contains the coordinate , that is, to the minimal recovery sets for the coordinate . Therefore, the methods developed in this article can be adapted to compute the minimal authorized coalitions in this type of SSS.

## 3. Computing Sharp Recovery Structures

In this section we develop an algorithm that provides a sharp recovery structure for an arbitrary linear code , in terms of minimal codewords of its dual , that is through a family of minimal words of such that . Consequently, it also provides a method for recovery, following the formula of equation (3), the locality of and its dual distance, as the smallest weight of a word in . The methods we will use come from the theory of Gröbner basis, however we will try to avoid Gröbner basis terminology.

Let be a primitive element of . We define a total ordering on as follows

 (4) x≺Ty if {wt(x)

where is the element of such that or if , and the same definition holds for the vector . Note that our ordering is mainly the order given by the Hamming weight with an extra total ordering for breaking ties. Remark that any other total order in the indices, different from for breaking ties, will define an alternative order that would be equally valid for our purposes.

###### Example 2.

Let us consider the vectors and over , being a primitive element of with . Then, since and .

Next we recall some concepts from Gröbner basis theory that we will use in what follows. We say that a codeword is expresed as a syzygy if we write , with and . In this case is called the leading term of the syzygy. Every codeword can be trivially expresed in this form, by taking . Given two vectors , we say that reduces if . Then, the vector is called the reduction of by and will be denoted by . Let . The vector is a reduction of by if for a non-negative integer there are elements and such that

 (5)

and we will denote it by . Note that, in general, a vector has not a unique reduction by a set .

###### Example 3.

Given the set and the vector . We can give two different reductions of by :

 x→t1v1=(1,0,0,1,1,1) and x→t2v2=(0,0,1,0,0,0)

We will compute a special set of syzygies called a Gröbner test-set as follows.

1. First we list all the non-zero syzygies of the code and we sort them with respect to the ordering in its leading terms.

2. Then, we go incrementally through the list . We add to the set and we remove from all those elements such that at least one of its non-zero coordinates from its leading term is equal to a multiple of the leading term of an element in .

3. Finally we reduce the trailing terms using the list .

###### Example 4.

[Toy Example] Consider a binary code with generator matrix

 G=(101011)∈F2×32.

To obtain a Gröbner test-set for we proceed as follows.

We initialize with all the non-zero syzygies of the code and we sort them w.r.t. .

 L=⎧⎪⎨⎪⎩(0,1,0)−(0,0,1),(1,0,0)−(0,0,1),(1,0,0)−(0,1,0),(0,1,1)−(0,0,0),(1,0,1)−(0,0,0),(1,1,0)−(0,0,0)⎫⎪⎬⎪⎭
• We add to and we omit from all multiples of .

• We add to and we omit from all multiples of .

The process ends because the list is empty. Now is a Gröbner test-set.

Note that depends only on the code and the ordering and can be computed from a set of generators by means of the Gröbner basis procedure based on linear algebra FGLM techniques, stated in [10]. Moreover the reduction of a vector in by this is show to be unique and independent of the order of the reductions in (5). Algorithms 1 gives the reduction of an element by a Gröbner test set .

The Gröbner test set defines a reduction that allows to check whether a vector is in the code if and only it reduces to the zero vector. Furthermore it allows us to compute a sharp recovery structure for .

###### Proposition 2.

Let be a Gröbner test set for the code and let . Then if and only if reduces to by .

###### Proof.

For a proof see [10]. ∎

The key idea on which our algorithm is based in the following result.

###### Proposition 3.

Let be a Gröbner test set for the code . For each coordinate let and let be an element of minimum Hamming weight in . Then is a sharp recovery structure for .

###### Proof.

Let be the smallest -minimal codeword with respect to the total ordering . We will prove that . Note that since is in the code then . There is a non-negative integer such that the set of reductions in Algorithm 1 for can be expresed in the form

 (6) mi→λ1⋅t1v1,v1→λ2⋅t2v2,…vn1→λs⋅ts0.

Thus there must be a vector , such that it is the first one involved in the chain of reductions in (6) with . Indeed it is clear that since is in the support of the codewords and which contradices the minimality of with respect to the total ordering . Without loss of generality we can suppose that .

Let and note that since . Suppose now that , hence is expresed as the syzygy . The syzygy can be used to reduce as follows , since without cancelling the -th position which is a contradiction with the fact that since the test set contains syzygies that cannot be reduced. Hence and we are done. ∎

According to Proposition 3, the minimal words providing a sharp recovery structure for can always be found in a test set , and it is not necessary to look for them in the whole code . This idea is used in Algorithm 2 that provides such a sharp recovery structure for . Roughly speaking, this algorithm applies Gaussian elimination to a large sparse matrix (namely list in this case), taking into account that we stop once we have enough codewords to cover all indices in . Note that it only requires a parity check matrix of the code to run.

###### Remark 3.

If , then the parity check matrix of some codes can be seen as the adjacency matrix of a non-directed graph. In these cases, Proposition 3 ensures that the procedure in Algorithm 2 performs the Horton’s algorithm for computing a minumum cycle basis of the graph [5, 8]. Thus, our Algorithm 2 extends the Horton’s algorithm from cycles in graphs to recovery sets in codes.

###### Example 5.

Let us consider again the code of Example 1. A parity check matrix of is easily obtained from its generator matrix. The Gröbner test set consists of 49 codewords. The following four among them define a sharp recovery structure for

 t1=(α,α,α+1,0,0,0,0,α+1,0)supp(t1)={1,2,3,8}t2=(α,α+1,0,α,1,0,0,0,0)supp(t2)={1,2,4,5}t3=(α,1,0,0,0,1,α+1,0,0)supp(t3)={1,2,6,7}t4=(α,0,0,1,1,0,0,0,α)supp(t4)={1,3,4,9}

with

 R1=supp(t1)∖{1}R2=supp(t1)∖{2}R3=supp(t1)∖{3}R4=supp(t2)∖{4}R5=supp(t2)∖{5}R6=supp(t3)∖{6}R7=supp(t3)∖{7}R8=supp(t1)∖{8}R9=supp(t4)∖{9}.

In particular, and , as announced in Example 1.

###### Proposition 4.

Algorithm 2 is correct and provides a sharp recovery structure for the code with parity check matrix .

###### Proof.

The first stage involves the initalization of the algorithm with a list whose elements are the rows of the parity check matrix and all their non-zero scalar multiples. That is: with items. Now, the list that we will use during the algorithm is initialized with the elements of and we add to all this elements the zero vector as label (to represent the trivial syzygies of the elements in the list). In another list, namely , we have all vectors of of weight less or equal to , we sort this list w.r.t. .

Then, at each step we remove the first element from the list . If any of the vectors with being an item of coincide with the -th element of then, the difference between and the label corresponding to form a codeword of minimal support of the dual code of . Otherwise, we add the items: as new elements of the list with the vector as label (i.e. we add the new reduced syzygies). We repeat this process until we get enough codewords to achieve a sharp recovery structure for the code . ∎

###### Proposition 5.

Let be a linear code of length and dimension over . If the ground field operations need one unit time, then Algorithm 2 applied to takes time , where is the total number of iterations of the algorithm.

###### Proof.

The proof is similar to the proof of [10, Theorem 4.3], adapted to the changes made in Algorithm 2 versus [10, Algorithm 2]. The hardest part of our algorithm is the management of the list . In each main loop iteration, up to new elements are added to the list , then compared and finally redundancy is eliminated. Note that comparing two vectors in requires field operations. At iteration , after inserting the new elements in the list , we have at most elements. Here, the first summand corresponds to the elements that initialized , while the second one comes from the fact that at each iteration the first element is removed and we add new elements. If is an upper bound for the number of iterations of Algorithm 2, this gives a total time of order

 O(nlog(q)((q−1)(n−k)+D(q−1)(n−k)−D))∼O(Dn(n−k)(q−1)log(q)).

###### Remark 4.

(1) The number of iterations in Algorithm 2, , is upper bounded by the fact that the weight of a minimal codeword is at most [1, Lemma 2.1]. Thus, it follows that

 D≤n−k+1∑i=0(ni)(q−1)i.

(2) Note that this algorithm also provides the dual distance of , as the smallest weight of one of the minimal words in the obtained recovery strucuture of . Remember that computing the minimum distance of a linear code is a NP complete problem, [2], which explains the high complexity of our algorithm. Note, however, that as in the case of the minimum distance, a recovery structure must be calculated only once per code.

## 4. Experimental Results

Algorithm 2 has been implemented with the program Sagemath [14]. In the following tables we summarize the average running times for several examples of codes, obtained with an Intel CoreTM 2 Duo

GHz. The experiments are performed as follows: We first generate a full rank random matrix of size

over using the command random_matrix (GF(q),k,n); then we take the corresponding code and compute its dual ; if the minimum distances of and are greater than 1, we apply Algorithm 2.

For each base field size , the experiment has been performed on random codes . The obtained results are shown in Tables 1 and 2. In Table 1 all codes have length and dimension . The first column contains the base field sized . Second column indicates the average running time for the computation of a sharp recovery structure for , measured in seconds. Third column shows the average number of vectors in . In Table 2, we deal with codes of different parameters, which are indicated in the second column. Here we have omitted the average number of vectors in .

## References

• [1] A. Ashikhmin and A. Barg, Minimal vectors in linear codes, IEEE Trans. Inform. Theory 44 (1998), no. 5, 2010–2017.
• [2] E. R. Berlekamp, R. J. McEliece, and H. C. A. van Tilborg, On the inherent intractability of certain coding problems, IEEE Trans. Information Theory, IT-24(3):384–386, 1978
• [3] M. Bertilsson and I. Ingemarsson, A Construction of Practical Secret Sharing Schemes using Linear Block Codes, in Advances in Cryptology - AUSCRYPT’92, vol. 718, 1992, 67–79.
• [4] M. Blaum and S.R. Hetzler, Integrated interleaved codes as locally recoverable codes: properties and performance, Int. J. Inf. Coding Theory 3(4) (2016), 324–344.
• [5] Borges-Quintana, M.; Borges-Trenard, M. A.; Fitzpatrick, P.; Martínez-Moro, E. Gröbner bases and combinatorics for binary codes. Appl. Algebra Engrg. Comm. Comput. 19 (2008), no. 5, 393–411
• [6] A. Faldum and W. Willems, Codes of small defect, Designs, Codes and Cryptography 10 (1997), 341–350.
• [7] P. Gopalan, C. Huang, H. Simitci and S. Yekhanin, On the locality of codeword symbols, IEEE Transactions on Information Theory 58(11) (2012), 6925–6934.
• [8] J. D. Horton, A polynomial-time algorithm to find the shortest cycle basis of a graph, SIAM J. Comput. 16(2) (1987), 358–366.
• [9] L. Jin, L. Ma and C. Xing, Construction of optimal locally repairable codes via automorphism groups of rational function fields, arXiv:1710.09638 (2017).
• [10] I. Márquez-Corbella, E. Martínez-Moro and E. Suárez-Canedo, On the ideal associated to a linear code, Adv. Math. Commun. 10(2) (2016), 229–254.
• [11] J.L. Massey, Minimal Codewords and Secret Sharing, in Proceedings of the 6th Joint Swedish-Russian International Workshop on Information Theory, 1993, 276–279.
• [12] MinT. Database for optimal parameters of -nets, -sequences, orthogonal arrays, linear codes and OOAs. Online available at http://mint.sbg.ac.at/index.php.
• [13] D. S. Papailiopoulos and A. G. Dimakis, Locally repairable codes, IEEE Trans. Inf. Theory 60(10) (2014), 5843–5855.
• [14] SageMath, the Sage Mathematics Software System (Version 8.4), The Sage Developers, 2018. Online available at http://www.sagemath.org.
• [15] A. S. Rawat, O. O. Koyluoglu, N. Silberstein, and S. Vishwanath, Optimal locally repairable and secure codes for distributed storage systems, IEEE Trans. Inf. Theory, 60(1) (2014), 212–236.
• [16] I. Tamo and A. Barg, A family of optimal locally recoverable codes, IEEE Trans. Inform. Theory, vol. 60(8) (2014), 4661–4676.
• [17] I. Tamo, A. Barg, S. Goparaju, and R. Calderbank, Cyclic LRC codes and their subfield subcodes, in Proc. IEEE Int. Sympos. Inform. Theory, Hong Kong, 2015, 1262–1266.