1 Introduction
A code
, a collection of vectors, is called locally recoverable with locality
, if content of any coordinate can be recovered by accessing only other coordinates [6, 10].Formally, a ary code of length cardinality and distance is a set of length vectors over an alphabet , with minimum pairwise Hamming distance . The quantity is called the dimension of and is called the rate of the code. If is a finite field and is a linear subspace of then is the dimension of as a vector space. Below, , and for any , is the projection of in the th coordinate. By extension, for any , is the projection of onto the coordinates of .
Definition.
A code is locally recoverable code (LRC) with locality if every coordinate is contained in a subset of size such that there is a function with the property that for every codeword
(1) 
where are the elements of We use the notation to refer to a code of length , dimension and locality
Locally recoverable codes have been the subject of intense research, including constructions [14, 2, 16, 11], bounds [3, 1, 15] and generalizations [17, 12, 13, 8]. In this paper, we investigate the maximum achievable rate of locally repairable codes such that reliable transmission is possible over a discrete memoryless channel (DMC). While LRCs are subject to a lot of interest, surprisingly, with the exception for [9], no paper deals with this quite basic theoretical question.
The result of [9] holds for a binary erasure channel
(BEC) with erasure probability
. The Shannon capacity of such a channel is . It was shown that to achieve a rate of , the locality must scale as . While the constant within is not clear, the method therein also does not extend to binary symmetric channel (BSC) or other binaryinput memoryless channels.In this note we do a finer analysis of the gap to capacity for LRCs. For a discrete memoryless channel given by a inputoutput stochastic transition matrix^{1}^{1}1We sometime also refer to a DMC by
to describe the inputoutput random variables.
, let be the Shannon capacity of the channel, and to be the capacity of the channel where we are constrained to use only a locally repairable code with locality . Let us define,An impossibility result in this regard gives a lower bound on the gap, while an achievability scheme gives an upper bound on the gap. Our results are summarized in Table 1. Here, is the binary entropy function. While the results hold for binaryinput channels, it is not difficult to extend the for the ary case. For the BEC and BSC, the results are also plotted in Fig. 1 for . Note that, we are able to exactly calculate the capacity for BEC, while we have tight upper and lower bounds for BSC.
[!t] Channel Lower Bound on Upper Bound on BEC() BSC() General

also achievable by linear codes.

we conjecture this bound to be tight.
To prove the lower (converse) and upper bounds for BEC we rely on simple information inequalities and random coding methods. It is difficult to extend the converse bounding arguments to other channels. However in some sense BEC is the ‘best’ channel among all binary input memoryless symmetric channels [7]. We can use that fact to lower bound the gap to capacity for more general channels including BSC. A random coding method for BSC also gives the upper bound on gap to capacity for any binary input channels by the same argument, as BSC is the ‘worst’ among all in the same sense.
Our results holds for some extended definition of locally recoverable codes [11].
Definition.
A code of cardinality is said to have the locality property (to be an LRC code) where , if each coordinate is contained in a subset of size at most such that the restriction of the code to the coordinates in forms a code of distance at least . Notice that the values of any coordinates of are determined by the values of the remaining coordinates, thus enabling local recovery. is called the repair group of coordinate .
As an example, we show an upper bound on gap to capacity for LRCs with , and give directions for the general case (see, Sec. 5).
2 LRC Capacity of the Binary Erasure Channel
For a binary erasure channel with erasure probability , the Shannon capacity is . Suppose when we are constrained to use a locally recoverable code with locality as the input, the capacity is .
Theorem 1.
The capacity of LRC with locality over BEC() is given by:
In the remainder of this section we prove this theorem.
2.1 Converse Bound
First we show the converse result.
Lemma 1.
Capacity of LRC codes over a BEC with erasure probability ,
Proof.
Assume that a code is used over BEC. The random codeword was sent over the channel. The received vector is . Let denote the erased coordinates.
Using Fano’s inequality, the probability of error is given by,
Now, note that . Therefore,
This implies,
Now,
where is the number of coordinates that are not in as well as their entire recovery group is not in . Hence,
where the subscript BEC denote that the average is with respect to the randomness in BEC. Let us now derive Let
is the indicator random variable that denotes that the
th coordinate as well as its recovery group are not in (not erased). We haveTherefore, . Therefore,
To achieve vanishing probability of error, one must have
∎
2.2 Achievability
Lemma 2.
There exists a family of LRC codes with rate
that when used over a BEC() results in a probability of error that goes to with .
Proof.
We will show this by constructing a code. Partition the set of coordinates into groups of size each (we assume that divides ). Now, consider the bits of a group as a supersymbol. Consider the inputoutput channel induced by these supersymbols instead of the BEC. We find the capacity of this channel, and then normalize by .
Let us choose the codewords in the following way. Within each group symbols are uniformly and independently (Bernoulli()) chosen. The last symbol of each group is the modulo2 sum of the other symbols. The rate of this code such that the probability of error being vanishing is given by^{2}^{2}2Here we assume that we employ a jointtypicality decoder that considers the each block of bits as a supersymbol over an alphabet of size .
where represents the bit input and output. Now we have,
We can now calculate Let the number of erasures in be . There are two cases to consider.
First case, . Then,
Second, . Then,
Therefore,
We have,
∎
It turns out that the above method extends to other channels. The achievability result for BEC also holds with linear code.
Proposition 1.
There exists a family of linear LRC codes with rate
that when used over a BEC() results in a probability of error that goes to with .
Proof.
To see this, randomly choose a generator matrix in the following way. Partition the set of coordinates into groups of size each. For each group chose columns randomly and uniformly from . The st column of each group is just the coordinatewise modulo2 sum of all the other columns of the group.
Now let us choose each columns of this matrix with probability and form a submatrix. We would like to find the rank of this submatrix. As long as less than or equal to columns are chosen from a group, it is equivalent to choosing columns randomly and uniformly from . Let be the set of chosen columns. Let be the number of groups from where all the elements are chosen. Therefore, the submatrix will have rank at least equal to the rank of a matrix where columns are randomly and uniformly chosen from . The rank of the submatrix is with probability as long as Now with probability we have . Therefore as long as , the rank of the submatrix is with probability at least . Therefore there must exist a matrix in the ensemble with rank . ∎
3 LRC Capacity of the Binary Symmetric Channel
For a binary symmetric channel with error probability , the Shannon capacity is . Suppose when we are constrained to use a locally recoverable code with locality as the input, the capacity is .
Theorem 2.
The capacity of LRC with locality over BSC() follows:
In the remainder of this section we prove this theorem.
3.1 Converse
The upper bound of theorem 2 follows from the more general results about binaryinput symmetric discrete memoryless channels. We postpone the proof till next section.
3.2 Achievability
Lemma 3.
There exists a family of LRC codes with rate
that when used over a BSC() results in a probability of error that goes to with .
Proof.
We will show the above by constructing a code. Again, partition the set of coordinates into groups of size each. Now, consider the bits of a group as a supersymbol. Consider the inputoutput channel induced by these supersymbols instead of the BSC. We find the capacity of this channel.
Let us choose the codewords in the following way. Within each group symbols are uniformly and independently (Bernoulli()) chosen. The last symbol of each group is the modulo2 sum of the other symbols. The rate of this code such that the probability of error being vanishing is given by
where represents the bit input and output. Note that we arrive at this rate by considering the group of bits as a supersymbol from an alphabet of size , and using a jointtypicality decoder. Now we have,
We can now calculate
Therefore,
After some simplifications, we have
∎
4 General binary inputsymmetric channels
The results for general binary inputsymmetric channels follow from the converse and achievability results for BEC or BSC because in some sense these channels are the best and worst among the general cases respectively. To formalize this, we need the notion of more capable channel. All the channels below are discrete memoryless channels.
Definition.
A channel is said to be more capable than another channel if for any input distribution on ,
It is known that among the binaryinput symmetric discrete memoryless channels of same capacity BSC is the least capable and BEC is the most capable [5]. The following can be derived from [7]. This result also follows from [4, ex. 16, p. 116].
Proposition 2.
Suppose the channel is more capable than the channel , and a code of rate achieves a probability of error over the channel . Then there exists a code of rate that achieves a probability of error over , where as .
Since we have an impossibility (converse) result for BEC and an achievability result for BSC, using Prop. 2, we can obtain the following result.
Theorem 3.
For any binaryinput symmetric discrete memoryless channel ,
Proof.
For a channel , suppose . Therefore, a BEC with erasure probability must be more capable than the channel . There exists an LRC of rate that achieves a vanishing probability of error over the channel . Therefore, there exists an LRC of rate that achieves a vanishing probability of error over the BEC of erasure probability . This implies,
which proves the upper bound.
On the other hand, suppose Therefore, a BSC with flip probability must be less capable than the channel . We know that there exists a code of rate
that achieves a vanishing probability of error over the BSC with error probability . Therefore there must exist a code of same rate that achieves a vanishing probability of error over the channel . ∎
5 Generalizing local repair: repairing multiple failures
It is now a natural question to ask whether our results extend to the general definition of locality. Indeed, the converse result for BEC extends quite straightforwardly, and , the capacity of BEC when we are restricted to use a code with locality, is bounded by,
However, it is not straightforward to extend the achievability result for erasure channel. Indeed, the codewords restricted to each repair group must form a code with minimum distance . Therefore it makes sense to choose random codewords of a code of distance as disjoint repair blocks to form the overall LRC. For this we need to figure out where is the output of a BEC where the input is a randomly chosen codeword of a fixed code of distance . We need to know how the complete statistics of the distribution of values in each set of coordinates for to evaluate this quantity.
On the other hand, if is a linear code and the channel is BSC, then the entropy of the output of the channel can be computed if we know the coset weight distribution of the code.
To construct a code with locality we first choose a fixed linear code of length and distance . Next we construct a random code of length . A codeword of is formed by concatenating randomly and uniformly chosen codewords of sidebyside. Again, if we use a jointtypicality decoding then the achievable rate of transmission is given by,
where is a randomly and uniformly chosen codeword of and is the output of a BSC with flip probability when the input to the BSC is . Now we have,
We can calculate when is a linear code.
if belongs to the th coset of the code, where is the number of vectors of Hamming weight in the th coset of the code , . Let us define the coset weight enumerator of the code
Then,
Now,
Therefore,
where . Overall,
Hamming code as local codes: two erasure per block
By taking the code to be the Hamming code of length , we can therefore have the following result for , as the cosetweight distribution of Hamming code is known:
This automatically gives a lower bound on since BEC is a more capable channel.
At this bound evaluates to . Note that, from the upper bound we have, . Therefore the bounds are not tight even at .
6 Open problems
There are some compelling open problems left to study regarding capacity of LRCs. First of all, for a BSC, the gap to capacity is not exactly characterized. We conjecture that the upper bound on the gap (see Table 1) is tight.
Not much is known regarding the capacity of generalized notions of LRCs. Even for an LRC that corrects two erasures per repair group, the capacity is unknown in the BEC (the bounds are not tight even when the erasure probability is zero).
Finally, while we do not foresee an obstacle to extend the results for larger alphabets, it would be good to have them documented.
Acknowledgement: The author is grateful to Hamed Hassani for letting him know about the notion of ‘more capable’ channels and their potential use in this context, and to Chandra Nair for discussions on the more capable channels.
References
 [1] A. Agarwal, A. Barg, S. Hu, A. Mazumdar, and I. Tamo. Combinatorial alphabetdependent bounds for locally recoverable codes. IEEE Transactions on Information Theory, 64(5):3481–3492, 2018.
 [2] A. Barg, K. Haymaker, E. W. Howe, G. L. Matthews, and A. VárillyAlvarado. Locally recoverable codes from algebraic curves and surfaces. In Algebraic Geometry for Coding Theory and Cryptography, pages 95–127. Springer, 2017.
 [3] V. R. Cadambe and A. Mazumdar. Bounds on the size of locally recoverable codes. Information Theory, IEEE Transactions on, 61(11):5787–5794, 2015.
 [4] I. Csiszár and J. Körner. Information Theory: Coding Theorems for Discrete Memoryless Systems. Academic Press, 1981.
 [5] Y. Geng, C. Nair, S. S. Shitz, and Z. V. Wang. On broadcast channels with binary inputs and symmetric outputs. IEEE Transactions on Information Theory, 59(11):6980–6989, 2013.
 [6] P. Gopalan, C. Huang, H. Simitci, and S. Yekhanin. On the locality of codeword symbols. IEEE Trans. Inform. Theory, 58(11):6925–6934, Nov. 2012.
 [7] J. Körner and K. Marton. Comparison of two noisy channels. Topics in information theory (ed. by I. Csiszar and P.Elias), pages 411–423, 1977.
 [8] A. Mazumdar. Storage capacity of repairable networks. IEEE Transactions on Information Theory, 61(11):5810–5821, 2015.
 [9] A. Mazumdar, V. Chandar, and G. W. Wornell. Updateefficiency and local repairability limits for capacity approaching codes. Selected Areas of Communications, IEEE Journal on, 32(5), 2014.
 [10] D. S. Papailiopoulos and A. G. Dimakis. Locally repairable codes. In Proc. Int. Symp. Inform. Theory, pages 2771–2775, Cambridge, MA, July 2012.
 [11] N. Prakash, G. M. Kamath, V. Lalitha, and P. V. Kumar. Optimal linear codes with a localerrorcorrection property. In Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on, pages 2776–2780. IEEE, 2012.
 [12] A. S. Rawat, A. Mazumdar, and S. Vishwanath. Cooperative local repair in distributed storage. EURASIP Journal on Advances in Signal Processing, 2015(1):107, 2015.
 [13] A. S. Rawat, D. S. Papailiopoulos, A. G. Dimakis, and S. Vishwanath. Locality and availability in distributed storage. In Information Theory (ISIT), 2014 IEEE International Symposium on, pages 681–685. IEEE, 2014.
 [14] I. Tamo and A. Barg. A family of optimal locally recoverable codes. IEEE Transactions on Information Theory, 60(8):4661–4676, 2014.
 [15] I. Tamo, A. Barg, and A. Frolov. Bounds on the parameters of locally recoverable codes. IEEE Transactions on Information Theory, 62(6):3070–3083, 2016.
 [16] I. Tamo, D. S. Papailiopoulos, and A. G. Dimakis. Optimal locally repairable codes and connections to matroid theory. IEEE Transactions on Information Theory, 62(12):6661–6671, 2016.
 [17] A. Wang and Z. Zhang. Repair locality with multiple erasure tolerance. IEEE Transactions on Information Theory, 60(11):6979–6987, 2014.
Comments
There are no comments yet.