Multilevel Diversity Coding with Secure Regeneration: Separate Coding Achieves the MBR Point

12/09/2017 ∙ by Shuo Shao, et al. ∙ Texas A&M University USTC 0

The problem of multilevel diversity coding with secure regeneration (MDC-SR) is considered, which includes the problems of multilevel diversity coding with regeneration (MDC-R) and secure regenerating code (SRC) as special cases. Two outer bounds are established, showing that separate coding of different messages using the respective SRCs can achieve the minimum-bandwidth-regeneration (MBR) point of the achievable normalized storage-capacity repair-bandwidth tradeoff regions for the general MDC-SR problem. The core of the new converse results is an exchange lemma, which can be established using Han's subset inequality.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Diversity coding and node repair are two fundamental ingredients of reliable distributed storage systems. While the study of diversity coding has been in the literature for decades [1, 2, 3, 4, 5, 6], systematic studies of node repair mechanisms were started only recently by Dimakis et al. in their pioneering work [7]. A particular model, which was first introduced in [7] and has since received a significant amount of attention in the literature [8, 9, 10, 11, 12, 13, 14, 15, 16], is the so-called (exact-repair) regenerating code (RC) problem.

More specifically, in an RC problem, a file of size is to be encoded in a total of distributed storage nodes, each of capacity . The encoding needs to ensure that the file can be perfectly recovered by having full access to any out of the total storage nodes. In addition, when a node failure occurs, it is required that the data originally stored in this failed node can be recovered by downloading data of size each from any remaining nodes. An interesting technical challenge is to characterize the optimal tradeoffs between the node capacity and the download bandwidth in satisfying both the file-recovery and node-repair requirements. However, despite intensive research efforts that have yielded many interesting and highly non-trivial partial results including a precise characterization of the minimum-storage-regenerating (MSR) and the minimum-bandwidth-regenerating (MBR) rate points [8, 9, 10, 11, 12, 13, 14, 15, 16, 17], the optimal tradeoffs between the node capacity and the download bandwidth have not been fully understood for the general RC problem.

More recently, two extensions of the RC problem, namely multilevel diversity coding with regeneration (MDC-R) and secure regenerating code (SRC), have also been studied in the literature. The problem of MDC-R was first introduced by Tian and Liu [18]. In an MDC-R problem, a total of independent files of size , respectively, are to be stored in distributed storage nodes, each of capacity . The encoding needs to ensure that the file can be perfectly recovered by having full access to any out of the total storage nodes for any . In addition, when a node failure occurs, it is required that the data originally stored in this failed node can be recovered by downloading data of size each from any remaining nodes.

Clearly, an RC problem can be viewed as an MDC-R problem with degenerate messages (i.e., for all ). Therefore, from the code construction perspective, it is natural to consider the so-called separate coding scheme, i.e., to construct a code for the MDC-R problem, we can simply use an RC to encode the file for each , and the coded messages for each file remain separate when stored in the storage nodes and during the repair processes. However, despite being a natural scheme, it was shown in [18] that separate coding is in general suboptimal in achieving the optimal tradeoffs between the normalized storage-capacity and repair-bandwidth. On the other hand, it has been shown that separate coding can, in fact, achieve both the MSR [18] and the MBR [19] points of the achievable normalized storage-capacity and repair-bandwidth tradeoff region for the general MDC-R problem.

The problem of SRC is an extension of the RC problem that further requires security guarantees during the repair processes. More specifically, the SRC problem that we consider is the RC problem [7, 8, 9, 10, 11, 12, 13, 14, 15, 16], with the additional constraint that the file needs to be kept information-theoretically secure against an eavesdropper, which can access the data downloaded to regenerate a total of different failed nodes under all possible repair groups. Obviously, this is only possible when . Furthermore, when , the secrecy requirement degenerates, and the SRC problem reduces to the RC problem without any repair secrecy requirement.

Under the additional secrecy requirement (), the optimal tradeoffs between the node capacity and repair bandwidth have been studied in [20, 21, 22, 24, 25, 26, 27, 28, 23]. In particular, Shah, Rashmi and Kumar [22] showed that a particular tradeoff point (referred to as the SRK point) can be achieved by extending an MBR code based on the product-matrix construction proposed in [8]. Later, it was shown [28] that for any given pair, there is a lower bound on , denoted by , such that when , the SRK point is the only conner point of the tradeoff region for the SRC problem. On the other hand, when , it is possible that the tradeoff region features multiple corner points. However, a precise characterization of the tradeoff region, including both the MSR and the MBR points, remains missing in general.

In this paper, we introduce the problem of multilevel diversity coding with secure regeneration (MDC-SR)111The problem of secure multilevel diversity coding without any node regeneration requirement has been considered in [29, 30]., which includes the problems of MDC-R and SRC as two special cases. Similar to the MDC-R problem, it is natural to consider the separate coding scheme for the MDC-SR problem as well. Our main result of the paper is to show that the optimality of separate coding in terms of achieving the MBR point of the achievable normalized storage-capacity and repair-bandwidth tradeoff region extends more generally from the MDC-R problem to the MDC-SR problem. When specialized to the SRC problem, this shows conclusively that the SRK point [22] is, in fact, the MBR point of the achievable normalized storage-capacity and repair-bandwidth tradeoff region, regardless of the number of corner points of the tradeoff region.

From the technical viewpoint, this is mainly accomplished by establishing two outer bounds (one of them must be “horizontal”, i.e., on the normalized repair-bandwidth only) on the achievable normalized storage-capacity and repair-bandwidth tradeoff region, which intersect precisely at the superposition of the SRK points. The core of the new converse results is an exchange lemma, which we establish by exploiting the built-in symmetry of the problem via Han’s subset inequality [31]. The meaning of “exchange” will be clear from the statement of the lemma. The lemma only relies on the functional dependencies for the repair processes and might be useful for solving some other related problems as well.

The rest of the paper is organized as follows. In Section II we formally introduce the problem of MDC-SR and the separate coding scheme. The main results of the paper are then presented in Section III. In Section IV, we introduce the exchange lemma and use it to establish the main results of the paper. Finally, we conclude the paper in Section V.

Notation

. Sets and random variables will be written in calligraphic and sans-serif fonts respectively, to differentiate from the real numbers written in normal math fonts. For any two integers

, we shall denote the set of consecutive integers by . The use of the brackets will be surpressed otherwise.

Ii The MDC-SR Problem

Let be a tuple of positive integers such that . Formally, an code consists of:

  • for each , a message-encoding function ;

  • for each , a message-decoding function ;

  • for each , , and , a repair-encoding function ;

  • for each and , a repair-decoding function .

For each , let

be a message that is uniformly distributed over

. The messages are assumed to be mutually independent. Let be a random key that is uniformly distributed over and independent of the messages . For each , is the data stored at the th storage node, and for each , , and , is the data downloaded from the th storage node in order to regenerate the data originally stored at the th storage node under the context of repair group . Obviously,

represent the message sizes, storage capacity, and repair bandwidth, respectively.

Fig. 1: The optimal tradeoff curve between the normalized storage-capacity and repair-bandwidth (the solid line) and the best possible tradeoffs that can be achieved by separate coding (dashed line) for the MDC-R problem with [18]. The outer bounds (6), (7) and (14) are evaluated as , , and , respectively. When set as equalities, they intersect precisely at the MBR point .

A normalized message-rate storage-capacity repair-bandwidth tuple is said to be achievable for the MDC-SR problem if an code (i.e., for all ) can be found such that the following requirements are satisfied:

  • rate normalization

    (1)

    for any ;

  • message recovery

    (2)

    for any ;

  • node regeneration

    (3)

    for any and ;

  • repair secrecy

    (4)

    for any such that , where is the collection of data that can be downloaded from the other nodes to regenerate node .

The closure of all achievable tuples is the achievable normalized message-rate storage-capacity repair-bandwidth tradeoff region for the MDC-SR problem. For a fixed normalized message-rate tuple , the achievable normalized storage-capacity repair-bandwidth tradeoff region is the collection of all normalized storage-capacity repair-bandwidth pairs such that and is denoted by .

Based on the above problem formulation, it should be clear that the MDC-RC can be specialized to various cases that have been considered in the literature:

  • the achievable normalized storage-capacity repair-bandwidth tradeoff region of the MDC-R problem is simply for any given normalized message-rate tuple ;

  • the achievable normalized storage-capacity repair-bandwidth tradeoff region of the SRC problem is simply .

  • the achievable normalized storage-capacity repair-bandwidth tradeoff region of the RC problem is simply or, equivalently, .

A simple and natural strategy for constructing a code for the MDC-SR problem is to use to an SRC to encode the message separately for each . Since the coded data are kept separate during the encoding, decoding and repair processes, we have

Thus, for the general MDC-SR problem, the separate coding normalized storage-capacity repair-bandwidth tradeoff region for a fixed normalized message-rate tuple is given by:

(5)

As mentioned previously, when , the repair secrecy requirement (4) degenerates, and the MDC-SR problem reduces to the MDC-R problem. In this case, it was shown in [19] that any achievable normalized message-rate storage-capacity repair-bandwidth tuple must satisfy:

(6)
(7)

where . When set as equalities, the intersection of (6) and (7) is given by:

For any , the MBR point for the RC problem can be written as [8]

(8)

We may thus conclude immediately from (5) (with ) that separate coding can achieve the MBR point for the general MDC-R problem.

Fig. 1 shows the optimal tradeoff curve between the normalized storage-capacity and repair-bandwidth and the best possible tradeoffs that can be achieved by separate coding for the MDC-R problem with [18]. Clearly, for this example, separate coding is strictly suboptimal when . On the other hand, when or , separate coding can, in fact, achieve the optimal tradeoffs. In particular, separate encoding can achieve the MSR point and the MBR point . In the same figure, the outer bounds (6) and (7) have also been plotted. As illustrated, they intersect precisely at the MBR point . Notice that for this example at least, the outer bound (7) is tight only at the MBR point.

Fig. 2: The optimal tradeoff curve between the normalized storage-capacity and repair-bandwidth for the SRC problem [28]. The outer bounds (12) and (13) are evaluated as and , respectively. When set as equalities, they intersect precisely at the MBR/SRK point .

Iii Main Results

Our main result of the paper is to show that the optimality of separate coding in terms of achieving the MBR point of the normalized storage-capacity repair-bandwidth tradeoff region extends more generally from the MDC-R problem to the MDC-SR problem. The results are summarized in the following theorem.

Theorem 1

For the general MDC-SR problem, any achievable normalized message-rate storage-capacity repair-bandwidth tuple must satisfy:

(9)
(10)

where . When set as equalities, the intersection of (9) and (10) is given by:

For any , the SRK point for the SRC problem can be written as [22]:

(11)

We may thus conclude immediately from (5) that separate coding can achieve the MBR point for the general MDC-SR problem.

The following corollary follows immediately from Theorem 1 by setting for all .

Corollary 1

For the general SRC problem, any achievable normalized storage-capacity repair-bandwidth tuple must satisfy:

(12)
(13)

When set as equalities, the intersection of (12) and (13) is precisely the SRK point (11) (with ), showing that the SRK point is, in fact, the MBR point of the achievable normalized storage-capacity repair-bandwidth tradeoff region for the general SRC problem.

While the outer bound (12) is known [20, 21, 28], the outer bound (13) is new to the best of our knowledge. Fig. 2 shows the optimal tradeoff curve between the normalized storage-capacity and repair-bandwidth for the SRC problem. Notice that for this example, the SRK point is, in fact, the MBR point even though the tradeoff region has two corner points. In the same figure, the outer bunds (12) and (13) have also been plotted. As illustrated, when set as equalities, they intersect precisely at the MBR/SRK point . Notice that for this example at least, the outer bound (13) is tight only at the MBR/SRK point.

As a final remark, we mention here that when , the outer bound (9) is reduced to (6) for the MDC-R problem by the fact that . However, when , the outer bound (10) is reduced to:

(14)

which is weaker than the outer bound (7) by the fact that . Fig. 1 shows the outer bound (14) for the MDC-R problem with . As illustrated, (14) is weaker than (7), and both are only tight at the MBR point .

Iv Proof of the Main Results

Let us first outline the main ingredients for proving the outer bounds (9) and (10).

  • Total number of nodes. To prove the outer bounds (9) and (10), let us first note that these bounds are independent of the total number of storage nodes in the system. Therefore, in our proof, we only need to consider the cases where . For the cases where , since any subsystem consisting of out of the total storage nodes must give rise to a MDC-SR problem. Therefore, these outer bounds must apply as well. When , any repair group of size is uniquely determined by the node to be repaired, i.e., , and hence can be dropped from the notation without causing any confusion.

  • Code symmetry. Due to the built-in symmetry of the problem, to prove the outer bounds (9), and (10), we only need to consider the so-called symmetrical codes [10, 32] for which the joint entropy of any subset of random variables from

    remains unchanged under any permutation over the storage-node indices.

  • Key collections of random variables. Focusing on the symmetrical codes, the following collections of random variables play a key role in our proof:

    These collections of random variables have also been used in [28, 19].

Note that if we consider representing the collection of the random variables as an -by- matrix and write on the diagonal of this matrix, then is the collection of these random variables with an upper triangular pattern. An important part of the proof is to understand the relations between different ’s (conditioned on a subset of messages) and then use them to derive the desired converse results. We shall discuss this next.

Iv-a Technical Lemmas

Lemma 2

For any code that satisfies the node regeneration requirement (3), is a function of for any and .

Proof:

Fix and . Let us first note that is a function of . As a result, is a function of . It thus follows immediately from the node regeneration requirement (3) that is a function of . Similarly and inductively, it can be shown that is a function of for all . This completes the proof of the lemma.

The above lemma demonstrates the “compactness” of and has a number of direct consequences. For example, for any fixed , it is clear from Lemma 2 that is a function of and hence for any .

The following lemma describes an “exchange” relation between and , which plays the key role in proving the outer bounds (6) and (7). The proof is rather long and is deferred to the Appendix to enhance the flow of the paper.

Lemma 3 (Exchange lemma)

For any symmetrical code that satisfies the node regeneration requirement (3), we have

(15)

for any , , , and .

Corollary 4

For any symmetrical code that satisfies the node regeneration requirement (3), we have

(16)

for any and .

Proof:

Fix and . Setting in (15), we have

(17)

for any . Add the inequalities (17) for and cancel the common term from both sides. We have

which can be equivalently written as

(18)

by the fact that . Multiplying both sides of (18) by

completes the proof of (16).

Corollary 5

For any symmetrical code that satisfies the node regeneration requirement (3), we have

(19)

for any and .

Proof:

Fix and . Set and in (15). We have

(20)

for any . Add the inequalities (20) for and cancel the common term from both sides. We have

(21)

Multiplying both sides of (21) by completes the proof of (19).

Iv-B The Proof

Consider a symmetrical regenerating code that satisfies the rate normalization requirement (1), the message recovery requirement (2), the node regeneration requirement (3), and the repair secrecy requirement (4). Let us first prove a few intermediate results. The outer bounds (9) and (10) will then follow immediately.

Proposition 1
(22)

for any . Consequently,

(23)
Proof:

To see (22), consider proof by induction. For the base case with , we have

where follows from the fact that is a function of , which is a function of by Lemma 2;

follows from the chain rule for entropy;

follows from the fact that ; and follows from the fact that . Assuming that (22) holds for some , we have