In an regenerating code (see ) over the finite field , a file of size is encoded and stored across nodes with each node storing symbols. Any regenerating code has to satisfy two properties: (a) data collection and (b) node repair. Data collection calls for the recovery of the entire file of size by fetching symbols each from any out of the nodes. The repair property requires that the node replacing the failed node should be able to recover all symbols of the failed node by contacting helper nodes and downloading symbols from each. A regenerating code is said to be a minimum storage regenerating (MSR) code if and . MSR codes are MDS codes that incur minimal repair bandwidth during the repair of a single node. There is a sub-family of MSR codes called optimal-access (repair-by-transfer) MSR codes where during repair, the helper nodes simply transmit a subset of size from the symbols contained in the node. Additional desirable attributes of an MSR codes are high rate, optimal access, small sub-packetization and small field size. We use to denote and set .
I-a Literature and Contributions
There are several known MSR constructions. The product matrix construction in  for any is one of the first. In , the authors provide a high-rate MSR construction using Hadamard designs for any parameters. In , high-rate systematic node repair MSR codes called Zigzag codes were constructed for . These codes however had large field size and sub-packetization that is exponential in in . This construction was extended in  to enable the repair of parity nodes. The existence of MSR codes for any value of as tends to infinity is shown in . In   lower bounds for sub-packetization() were presented. In  a lower bound for the special case of an optimal-access MSR code was provided. This bound was recently improved by Balaji et al. in  to . The latter results proves sub-packetization-optimality of the explicit codes provided in , ,  for and the non-explicit code in  for .
Though the literature contains prior optimal access constructions for , the resultant codes were either non-explicit , or else have large sub-packetization , or are of high field size . In the present paper, we present optimal-access MSR codes that have optimal sub-packetization for any and which can be constructed using a field of size .
We set and . We set for . The symbol indicates a known quantity in an equation. We use
to denote the components of a vectorand define where , .
Ii The MSR Construction
We provide a construction for an MSR code having the following parameters:
for any and . We will show in the later sections, that the construction is explicit and can be constructed with small field size for and for . Thus, . A more precise field-size computation appears in Section:IV.
MSR codes for any can be obtained by shortening the MSR code constructed with parameters , and by setting message symbols to zero.
Ii-a 3D representation of the codeword
The codeword of MSR code containing symbols can be described as array of symbols
in , with the help of 3-D data cube (see Fig.1(a)) having dimension . In the 3D representation each plane of the data cube is indexed by .
Here, code symbols are indexed by 2-tuple: . Each code symbol is a vector of symbols in : Therefore, the codeword .
We describe the code through an parity-check matrix whose rows are indexed by and columns by :
for all . The corresponding parity-check equations are given by:
for all , . Here, , . We set for any . The remaining symbols can be assumed to be distinct for now. A detailed discussion of assignment is provided in Section:IV.
Ii-B Intersection Score
This parameter will help describe the repair and data collection properties of the MSR construction. Given a subset of nodes , plane , the intersection score is defined as:
Iii MSR Construction for ,
The parameters of this code are . During repair, nodes act as helper nodes. Thus, a single node remains aloof during node repair.
Let be the failed node and be the aloof node, . Helper information sent by a node for repair of node is given by:
We first order planes by the intersection score IS and then perform repair sequentially plane by plane.
III-A0a Is i.e., ()
In this case there are three unknown symbols and three linearly independent equations. Upon solving, we recover the symbols of the failed node.
III-A0b Is i.e., ()
As all the symbols corresponding to are recovered, we know , therefore the only unknown out-of-plane symbol is the one corresponding to the failed node. The three unknown symbols can be solved for.
We have thus recovered two failed node symbols per each plane amounting to a total of symbols.
Iii-B Data Collection Property
We will prove the data collection property by showing that any erased nodes can be recovered. There are two kinds of possible three erasure patterns: (a) where the three erasures occur in different -sections and (b) when there are two erasures in one -section with the third in a different -section.
is the set of erasures. For the case of planes with intersection score zero, this reduces to solving:
as all out-of-plane symbols are available. Now for planes with intersection score , by assuming that erased symbols for planes with IS are recovered, recovering symbols corresponding to planes with IS reduces to the case in (3) as the out-of-plane symbols in (2) are either already available or have been recovered in planes with intersection score .
is the set of erasures. We now consider a plane with IS, i.e., . Let . Solving for erased symbols with (4) is impossible as the number of unknown symbols are , whereas the number of equations are . Therefore, we also consider equations corresponding to plane to recover the erased symbols of plane . Note that IS.
for all .
The symbols, can be recovered if the matrix,
is invertible. We prove that is invertible by showing that the left null space of contains only the all- vector. Consider the vector in the left null space to be of the form:
Define polynomials, for . If is in the null space of , we must have: , , ,
It is clear that and are roots of polynomials and , As the polynomials are of degree , they can be expressed in following form:
results in and hence , proving the MDS property.
The case of IS , i.e., for a plane reduces to solving equations (4), 5 as the out-of-plane symbols corresponding to node have intersection score 1 and are recovered in previous step.
Multiple planes need to be solved together when the number of erasures in a -section is more than one.
Multiple planes need to be solved together when the number of aloof nodes in a -section is more than one.
Iv Data Collection for , any
We prove the data collection property of the MSR code, by showing that any -erasures can be recovered. The proof provided here holds for any and any . As before, erased nodes are recovered by following a sequential decoding procedure where planes are ordered by intersection score IS and erased symbols are recovered plane-by-plane in order of increasing intersection score.
Iv-a Intersection Score IS
Iv-B Intersection Score IS
In the planes considered here, we assume that erased symbols corresponding to planes with smaller intersection score have already been recovered. The proof proceeds in two steps, the first step involves reducing the problem to showing invertibility of a reduced parity check (pc) matrix, the second step proves that the reduced pc matrix is indeed invertible.
Iv-B1 The Reduction
For a given plane , we group the erasures as below:
Rewriting the parity check equation from (2), we get
For , if then IS and therefore by the induction assumption, is known. If , then IS. This observation results in the following equation:
for all . The number of equations is here, whereas the number of unknowns is . We therefore need to bring in additional equations to solve for the erased symbols in this plane. Therefore, for a plane we pick equations corresponding to all the planes in , where
for all . It is to be noted that the total number of erased symbols within all the planes in and the number of equations are the same. From here on we use as a variable that identifies planes in .
We will now be restricting to the sub matrix of parity check, i.e., the parity check equations corresponding to planes in , i.e, to erased symbols :
where , , , Proving the invertibility of this sub matrix implies recovery of symbols
Consider to be a vector in the left null space of matrix . We consider a polynomial interpretation of this vector and set . By the null space condition, we obtain for all and . Upon careful substitution of parity-check matrix components from (1), the left-null space conditions on matrix can be summarized in the form:
for all where, . Therefore the polynomials can be expressed as:
where is a polynomial of degree , 111The amount of reduction , is due to the distinctness of ’s in set . Note that is independant of . Thereby, substituting the polynomial from (10) in (9) results in: We need to prove that are zero polynomials, which is equivalent to proving invertibility of the reduced pc matrix :
, , .
Iv-B2 Inductive Proof
We will now prove that the matrix is invertible. This will be done in two steps. In the first step we consider the erasure patterns where all the erasures occur in a single -section and prove that the erased symbols can be recovered. In the next step we group the erasures by the -section they belong to and prove that the erasures can be recovered by inducting on the number of -sections with non-zero erasures. Let , for some and . Let be the support set of erasure weight vector , , . Total number of planes , .
for all and therefore in the proofs that follow, we consider cases of
We will define here a simpler assignment of , that results in MDS property for the first step where erasures occur in a single -section. The assignment also makes sure that the repair property goes through. For any given , the collection has distinct elements, this property of assignment will be used in the repair property proofs. For , the matrix is defined as follows:
It is to be noted that in the above assignment: when . The coefficients assigned are such that is a collection of distinct elements in a field of characteristic two.
We show here a way to do the assignment. Consider a sub-group of and cosets ,. for and for . We pick coefficients for from and the corresponding multiples can be picked from and the remaining , ’s corresponding to are picked from . When is even and define , where is primitive element and set . Therefore, by chosing field size such that , we get . If the smallest possible field size that satisfies
results in a odd, we just take double the field size. This results in field size for and for .
We define the matrix as the reduced pc matrix when erasures are given by . is a square matrix with dimension where and . are parameters equivalent to for the reduced pc matrix . as .
IV-B2b Base Cases
Let , and . We will now look at the base case, i.e when . The total number of symbols remaining in is equal to as each plane has symbols remaining and there are planes.
Let and , the remaining planes in are given by . All the erased symbols are given by: .
Case 1: , ,
The reduced pc matrix has columns corresponding to symbols and and planes , . Wlog we assume that and this results in the reduced matrix:
for . We however define it for any to use it in induction. The determinant of matrix is . As we have , the determinant is non-zero. The matrix for the cases are also determined in the same way as it is done for . For case we assume wlog that in order to obtain matrix .
Case 2: , ,
The determinant of matrix is