New Results on Multilevel Diversity Coding with Secure Regeneration

01/11/2018 ∙ by Shuo Shao, et al. ∙ Shanghai Jiao Tong University Texas A&M University USTC 0

The problem of multilevel diversity coding with secure regeneration is revisited. Under the assumption that the eavesdropper can access the repair data for all compromised storage nodes, Shao el al. provided a precise characterization of the minimum-bandwidth-regeneration (MBR) point of the achievable normalized storage-capacity repair-bandwidth tradeoff region. In this paper, it is shown that the MBR point of the achievable normalized storage-capacity repair-bandwidth tradeoff region remains the same even if we assume that the eavesdropper can access the repair data for some compromised storage nodes (type 2 compromised nodes) but only the data contents of the remaining compromised nodes (type 1 compromised nodes), as long as the number of type 1 compromised nodes is no greater than that of type 2 compromised nodes.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Diversity coding, node repair, and security are three basic ingredients of modern distributed storage systems. The interplay of all three ingredients is captured by a fairly general mathematical model known multilevel diversity coding with secure regeneration (MDC-SR) [1].

More specifically, in an MDC-SR problem, a total of independent files of size , respectively, are to be encoded and stored in distributed storage nodes, each of capacity . The encoding needs to ensure that:

  • (Diversity coding) the file can be perfectly recovered by having full access to any out of the total storage nodes for any ;

  • (Node repair) when node failures occur and there are remaining nodes in the system, any failed node can be recovered by downloading data of size from each one of the remaining nodes;

  • (Security) the files needs to be kept information-theoretically secure against an eavesdropper, which can access the repair data for compromised storage nodes.

Setting , the above problem reduces to the problem of multilevel diversity coding with regeneration (MDC-R) considered in [2, 3]. Setting for all , the above problem reduces to the secure regenerating code (SRC) problem considered in [4, 5, 6, 7, 8, 9, 10, 11]. The goal is to understand the optimal tradeoffs between the storage capacity and repair bandwidth in satisfying all three aforementioned requirements.

From the code construction perspective, it is natural to consider the so-called separate coding scheme, i.e., to construct a code for the MDC-SR problem, we can simply use an SRC to encode the file for each , and the coded messages for each file remain separate when stored in the storage nodes and during the repair processes. However, despite being a natural scheme, it was shown in [2] that separate coding is in general suboptimal in achieving the optimal tradeoffs between the normalized storage-capacity and repair-bandwidth for the MDC-R problem (which is a special case of the MDC-SR problem as mentioned previously). On the other hand, it has been shown [1] that separate coding can, in fact, achieve the minimum-bandwidth-regenerating (MBR) point of the achievable normalized storage-capacity and repair-bandwidth tradeoff region for the general MDC-SR problem. Nevertheless, the optimal tradeoffs between the storage capacity and download bandwidth , and, the performance of the minimum-storage-regenerating (MSR) point are still not fully understood. Especially for the MSR point, a code was given in [6] for SRC problem by extending the known MSR code without any security constraint. This coding scheme can achieve the MSR point when and the eavesdropper can only observe type I compromised nodes (the definition of type I compromised node will be defined in the following part). However, it is still unknown as to whether this code is optimal for the more general eavesdropper model in our paper.

In this paper, we shall revisit the MDC-SR problem with a more general eavesdropping model. More specifically, instead of assuming that the eavesdropper can access the repair data for all compromised storage nodes, we shall assume that the compromised storage nodes can be divided into two different categories: type I compromised nodes and type II compromised nodes. While for the type II compromised nodes, we assume that the eavesdropper can access the repair data as before, for the type I compromised nodes we assume that the eavesdropper can only access the stored data contents.

Let and be the number of type I compromised nodes and type II compromised nodes respectively, and be the total number of compromised nodes. By the node repair requirement, the data contents stored at any node can be fully recovered from its repair data. Therefore, for any fixed , the eavesdropper becomes weaker as increases, which leads to a potentially larger achievable normalized storage-capacity and repair-bandwidth tradeoff region. A question of fundamental interest is to understand whether increasing can lead to a strictly larger achievable normalized storage-capacity and repair-bandwidth tradeoff region. Our main result of the paper is to show that the MBR point of the achievable normalized storage-capacity and repair-bandwidth tradeoff region remains the same, as long as (or equivalently, by the fact that ). From the technical viewpoint, this is mainly accomplished by establishing two outer bounds (one of them must be “horizontal”, i.e., on the normalized repair-bandwidth only) on the achievable normalized storage-capacity and repair-bandwidth tradeoff region, which intersect precisely at the MBR point.

The rest of the paper is organized as follows. In Section II we formally introduce the problem of MDC-SR with the generalized eavesdropping model. The main results of the paper are then presented in Section III. In Section IV, we introduce two “exchange” lemmas and use them to establish the main results of the paper. Finally, we conclude the paper in Section V.

Notation

. Sets and random variables will be written in calligraphic and sans-serif fonts respectively, to differentiate from the real numbers written in normal math fonts. For any two integers

, we shall denote the set of consecutive integers by . The use of the brackets will be suppressed otherwise.

Ii The Generalized MDC-SR Problem

In this paper, we study a distributed storage system that share the same file recovery and node repair function with [1]. Let be a tuple of positive integers such that . Formally, an code consists of:

  • for each , a message-encoding function ;

  • for each , a message-decoding function ;

  • for each , , and , a repair-encoding function ;

  • for each and , a repair-decoding function .

For each , let

be a message that is uniformly distributed over

. The messages are assumed to be mutually independent. Let be a random key that is uniformly distributed over and independent of the messages . For each , let be the data stored at the th storage node, and for each , , and , let be the data downloaded from the th storage node in order to regenerate the data originally stored at the th storage node under the context of repair group . Obviously,

represent the message sizes, storage capacity, and repair bandwidth, respectively.

The main deference between our definition in this paper and that in [1] is the model of eavesdropper. The eavesdropper now can observer a more complicated data combination consisted of both stored content and repair content.Let and be two nonnegative integers such that . A normalized message-rate storage-capacity repair-bandwidth tuple is said to be achievable for the generalized MDC-SR problem if an code (i.e., for all ) can be found such that:

  • (rate normalization)

    (1)

    for any ;

  • (message recovery)

    (2)

    for any ;

  • (node regeneration)

    (3)

    for any and ;

  • (repair secrecy)

    (4)

    for any such that , and (so and represent the sets of type I and type II compromised storage nodes, respectively), where is the collection of data that can be downloaded from the other nodes to regenerate node .

The closure of all achievable tuples is the achievable normalized message-rate storage-capacity repair-bandwidth tradeoff region for the generalized MDC-SR problem. For a fixed normalized message-rate tuple , the achievable normalized storage-capacity repair-bandwidth tradeoff region is the collection of all normalized storage-capacity repair-bandwidth pairs such that and is denoted by .

Fixing and setting , the generalized MDC-SR problem reduces to the MDC-SR problem considered previously in [1], where it was shown that any achievable normalized message-rate storage-capacity repair-bandwidth tuple must satisfy:

(5)
(6)

When set as equalities, the intersection of (5) and (6) is given by:

(7)

which can be achieved by separate encoding with a previous scheme proposed by Shah, Rashmi, and Kumar [6]. This provides a precise characterization of the MBR point for the MDC-SR problem.

Iii Main Results

Our main result of the paper is to show that the tradeoff point (7) remains to be the MBR point of for the generalized MDC-SR problem as long as . The results are summarized in the following theorem.

Theorem 1

For the generalized MDC-SR problem, any achievable normalized message-rate storage-capacity repair-bandwidth tuple must satisfy:

(8)

and in addition, when , we also have

(9)

where . When set as equalities, the intersection of (8) and (9) is precisely given by (7). We may thus conclude immediately that (7) is the MBR point of for the generalized MDC-SR problem as long as .

Note that setting , the outer bound (9) reduces to

(10)

So while the outer bound (8) coincides with (5), the outer bound (9) does not reduce to (6) when setting . Simple calculations yield that the outer bound (10) is stronger than (6) if and only if . In particular, when , the outer bound (10) reduces to that for the MDC-R problem [3], while the outer bound (6) is strictly weaker. Fig. 1 shows the comparison of (10) and (6) when in MDC-SR problem. In this figure, the outer bound (6) is below outer bound (10), though both of them intersect with (8) at the MBR point.

Fig. 1: The optimal tradeoff region for MDC-SR problem when . The outer bounds (8), (10) and (6) are evaluated as , , and , respectively. When set as equalities, they intersect precisely at the MBR point .

Iv Proof of the Main Results

Let us first outline the main ingredients for proving the outer bounds (8) and (9).

  • Total number of nodes. To prove the outer bounds (8) and (9), let us first note that these bounds are independent of the total number of storage nodes in the system. Therefore, in our proof, we only need to consider the cases where . For the cases where , since any subsystem consisting of out of the total storage nodes must give rise to a MDC-SR problem. Therefore, these outer bounds must apply as well. When , any repair group of size is uniquely determined by the node to be repaired, i.e., , and hence can be dropped from the notation without causing any confusion.

  • Code symmetry. Due to the built-in symmetry of the problem, to prove the outer bounds (8) and (9), we only need to consider the so-called symmetrical codes [12] for which the joint entropy of any subset of random variables from

    remains unchanged under any permutation over the storage-node indices.

  • Key collections of random variables. Focusing on the symmetrical codes, the following collections of random variables play a key role in our proof:

    These collections of random variables have also been used in [11, 3].

An important part of the proof is to understand the relations between the collections of random variables defined above, and to use them to derive the desired converse results. We shall discuss this next.

Iv-a Technical Lemmas

Lemma 1

For any code that satisfies the node regeneration requirement (3), is a function of for any and .

The above lemma, which was first introduced in [1, 11], demonstrates the “compactness” of and has a number of direct consequences. For example, for any fixed , it is clear from Lemma 1 that is a function of and hence for any .

Lemma 2 (Exchange lemma I [1])

For any symmetrical , code that satisfies the node regeneration requirement (3), we have

(11)

for any , , , and .

Corollary 3

For any symmetrical , code that satisfies the node regeneration requirement (3), we have

(12)

for any , , and .

Proof:

Set in (11). We have

(13)

for any . Add the inequalities (13) for and cancel the common term from both sides. We have

Corollary 4

For any symmetrical , code that satisfies the node regeneration requirement (3), we have

(14)

for any , and .

Proof:

Set , and in (12). We have

(15)

which can be equivalently written as

(16)

by the fact that . Multiplying both sides of (16) by

completes the proof of (12).

Lemma 5 (exchange lemma II)

For any symmetrical , code that satisfies the node regeneration requirement (3), we have

(17)

for any and .

Proof:

See the Appendix.

We note here that when setting , the above lemma coincides with Lemma 2 with and .

Iv-B The Proof

Consider a symmetrical , regenerating code that satisfies the rate normalization requirement (1), the message recovery requirement (2), the node regeneration requirement (3), and the repair secrecy requirement (4). Let us first prove a few intermediate results. The outer bounds (8) and (9) will then follow immediately.

Proposition 1
(18)

for any . Consequently,

(19)
Proof:

To see (18), consider proof by induction. For the base case with , we have

where follows from the fact that is a function of , which is a function of by Lemma 1;

follows from the chain rule for entropy;

follows from the fact that ; and follows from the fact that . Assuming that (18) holds for some , we have

where follows from the induction assumption; follows from Corollary 3; follows from the fact that is a function of , which is is a function of by Lemma 1; follows from the chain rule for entropy; and follows from the facts that is independent of and that . This completes the induction step and hence the proof of (18).

To see (19), simply set in (18). We have

(20)

Note that

(21)

where the last equality follows from the fact that by the repair secrecy requirement (4). Substituting (21) into (20) completes the proof of (19).

Proposition 2
(22)
Proof:

First note that for any and , we have

(23)

where