An Information-Theoretical Analysis of the Minimum Cost to Erase Information

02/21/2018
by   Tetsunao Matsuta, et al.
0

We normally hold a lot of confidential information in hard disk drives and solid-state drives. When we want to erase such information to prevent the leakage, we have to overwrite the sequence of information with a sequence of symbols independent of the information. The overwriting is needed only at places where overwritten symbols are different from original symbols. Then, the cost of overwrites such as the number of overwritten symbols to erase information is important. In this paper, we clarify the minimum cost such as the minimum number of overwrites to erase information under weak and strong independence criteria. The former (resp. the latter) criterion represents that the mutual information between the original sequence and the overwritten sequence normalized (resp. not normalized) by the length of the sequences is less than a given desired value.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

05/28/2018

On the number of symbols that forces a transversal

Akbari and Alipour conjectured that any Latin array of order n with at l...
06/28/2019

Extending de Bruijn sequences to larger alphabets

A circular de Bruijn sequence of order n in an alphabet of k symbols is ...
06/10/2019

On Embedding De Bruijn Sequences by Increasing the Alphabet Size

The generalization of De Bruijn sequences to infinite sequences with res...
04/04/2019

Learning to Decipher Hate Symbols

Existing computational models to understand hate speech typically frame ...
09/29/2021

Finite-State Mutual Dimension

In 2004, Dai, Lathrop, Lutz, and Mayordomo defined and investigated the ...
06/02/2017

Efficient Textual Representation of Structure

This paper attempts a more formal approach to the legibility of text bas...
09/01/1997

Identifying Hierarchical Structure in Sequences: A linear-time algorithm

SEQUITUR is an algorithm that infers a hierarchical structure from a seq...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Since services and activities using various types of information have increased, we normally hold a lot of confidential information. For example, storage devices such as hard disk drives (HDDs), solid-state drives (SSDs) and USB flash drives of individuals and companies hold personal addresses, names, phone numbers, e-mail addresses, credit card numbers, etc. When we want to discard, refurbish or just increase the security of these devices, we will usually erase information to prevent the leakage.

In order to erase information, we have to overwrite the sequence of information with a sequence of symbols independent of the information. Commonly used methods of erasure are to overwrite information with uniform random numbers or repeated specific patterns such as all zeros and all ones. There are several standards [3, 4, 5, 6, 7] to erase information. Although most of these standards propose to repeat overwriting many times, overwriting data once is adequate to erase information for modern storage devices (see, e.g., [7, Section 2.3]).

The overwriting is needed only at places where overwritten symbols are different from original symbols, e.g., 0 to 1 or 1 to 0 for binary sequences. If there are so many overwritten symbols, the overwriting damages devices, shortens the storage life and may also take write time. This is crucial for devices with a limited number of writes such as SSDs and USB flash drives. Thus, we want to reduce the number of overwritten symbols when we erase information. Here comes a natural question: “What is the minimum number of overwritten symbols?”.

In this paper, we clarify the minimum cost such as the minimum number or time of overwrites to erase information. As we stated in the above, for a binary sequence, the overwriting occurs at places where overwritten symbols are different from original symbols. In this case, a proper measure of the cost is the Hamming distance between the original sequence and the overwritten sequence. From this point of view, the information erasure can be modeled by correlated sources as Fig. 1 which actually is a somewhat general model. In this model, sequences emitted from source 1 and source 2 represent confidential information and information to be erased, respectively. For example, source 1 and source 2 are regarded as a fingerprint and its quantized image, respectively. When two correlated sources are identical, the model corresponds to the above mentioned situation. As shown in this figure, the encoder can observe one of the sequences. The encoder outputs a sequence that represents the overwritten sequence. Here, we allow the encoder to observe a uniform random number of limited size to generate an independent sequence. Then, the cost can be measured by a function of the input source sequence and the output sequence of the encoder.

source 1 seq.correlatedsource 2 seq.encoderuniform random numberoutputcost functionindependent
Figure 1: Information Erasure Model

For this information erasure model, we consider a weak and a strong independence criteria. The former (resp. the latter) criterion represents that the mutual information between the source sequence and the output sequence of the encoder normalized (resp. not normalized) by the length (blocklength) of sequences is less than a given desired value. For the weak independence criterion, we consider the average cost and the worst-case

cost. The former cost represents the expectation of the cost with respect to the sequences. The latter cost represents the limit superior in probability

[8] of the cost. Then, by using information-spectrum quantities [8], we characterize the minimum average and the minimum worst-case costs for general sources, where the block length is unlimited. For the strong independence criterion, by employing a stochastic encoder, we give a single-letter characterization of the minimum average cost for stationary memoryless sources, where the blocklength is unlimited. On the other hand, for the strong (same as the weak in this case) independence criterion, we also consider the non-asymptotic minimum average cost for a given finite blocklength. Then, we give a single-letter characterization of it for stationary memoryless sources. We show that the minimum average and the minimum worst-case costs can be characterized by the distortion-rate function for the lossy source coding problem (see. e.g., [8]) when the two correlated sources are identical. This means that our problem setting gives a new point of view of the lossy source coding problem. We also show that for stationary memoryless sources, there exists a sufficient condition such that the optimal method of erasure from the point of view of the cost is to overwrite the source sequence with repeated identical symbols.

There are some related studies [9, 10] investigating a relationship between a cost and statistical independence of sequences. These studies deal with correlated two sequences (referred to as confidential sequence and public sequence in this paper) and consider systems that reveal a sequence (referred to as revealed sequence) related to the public sequence while keeping the confidential sequence secret. In [9], the public sequence is encoded to a codeword and is decoded to the revealed sequence. In [10], the public sequence is directly and randomly mapped to the revealed sequence. These studies adopt the mutual information111More precisely, the study [9] adopts the conditional entropy of the confidential sequence given the codeword. between the confidential sequence and the revealed sequence (or codeword in [9]) in order to measure the independence. Then, these studies give a trade-off between the mutual information normalized by the blocklength and the average distortion (i.e., cost) between the public sequence and the revealed sequence. We note that in these studies, the uniform random number of limited size is not assumed. Especially, in [9], the system reveals the sequence via a codeword without any auxiliary random number. Thus, system models in [9] and [10] are fundamentally different from our information erasure model. Moreover, these studies only consider sequences emitted from stationary memoryless sources and a certain limited distortion (cost) function. Thus, problem formulations in these studies are different especially from that for the weak independence criterion in our study. The problem formulation in the study [10] is rather related to that for the strong independence criterion in which we consider a stochastic encoder and stationary memoryless sources. However, in [10] (and also [9]), there is not any discussion about the optimality of the revealed sequence of repeated identical symbols which is important in the information erasure for comparison with a known method.

The rest of this paper is organized as follows. In Section 2, we give some notations and formal definitions of the minimum average and the minimum worst-case costs under the weak independence criterion. Then, we characterize these costs for general sources. In Section 3, we give the formal definition of the minimum average cost under the strong independence criterion. We also give the formal definition of the non-asymptotic minimum average cost. Then, we give a single-letter characterization of these costs and some results obtained from this characterization. In Section 4, we show proofs for characterizations of minimum costs under the weak independence criterion. In Section 5, we conclude the paper.

2 Minimum Costs to Erase Information under the Weak Independence Criterion

In this section, we consider the minimum average and the minimum worst-case costs under the weak independence criterion, and characterize these costs for general sources. We show some special cases of these costs in this section.

2.1 Problem Formulation

In this section, we provide the formal setting of the information erasure and define the minimum average and the minimum worst-case costs under the weak independence criterion.

Unless otherwise stated, we use the following notations throughout this paper (not just this section). The probability distribution of a random variable (RV)

is denoted by the subscript notation

, and the conditional probability distribution for

given an RV is denoted by . The -fold Cartesian product of a set is denoted by while an -length sequence of symbols is denoted by . The sequence of RVs is denoted by the bold-face letter . Hereafter, means the natural logarithm.

Let , and be finite sets, be a positive integer, and . Let

be an RV uniformly distributed on

, and be a pair of RVs on such that is independent of . The pair of a sequence of RVs represents a pair of general sources [8] that is not required to satisfy the consistency condition.

For the information erasure model (Fig. 1), let be an encoder, and be a cost function satisfying

We give two examples of the information erasure model to better understand it.

Example 1.

Let a sequence be confidential -length binary data and be observed by some reading device, where we define . Let a sequence be the observed -length binary data which is actually stored in a storage device, where we define . Now suppose that we can no longer read , but we can access the storage device and read the stored data . Then, we want to overwrite to keep secret. To this end, let us overwrite the data by all zero sequence. Then, we can define and the encoder as for any and any . If we only overwrite a half of the data, i.e., we define the encoder as for any and any , the output of the encoder is no longer independent of , but a cost may be reduced. Obviously, we can define a more complicated encoder as follows: Let and

If we wish to count the number of overwrites of binary data, we define the cost function by the (normalized) hamming distance, i.e., , where denotes the indicator function.

Example 2.

Let be a confidential grayscale image with rather large dots, and be its quantized binary image222 and represent black and white dots, respectively. printed on a paper, where we define and . When we discard the paper of the binary image , we modify333When shredding the paper into strips, it may be reassembled. Thus, we want to modify the original image. it by using an eraser and a black ink pen in order to keep the grayscale image secret. If the eraser can erase black dots clearly (probably the eraser or the black ink is special), the modified image is also a binary image. Thus, we can define and encoders as those in Example 1. Suppose that the eraser is more expensive than the pen, and we pay (yen, dollar, etc.) for writing a black dot and for erasing a black dot. Then, we may define the cost function as , where

Before we show several definitions, we introduce the limit superior and the limit inferior in probability [8].

Definition 1 (Limit superior/inferior in probability).

For an arbitrary sequence of real-valued RVs, we respectively define the limit superior and the limit inferior in probability by

We define the worst-case cost by the limit superior in probability of the cost, i.e.,

Then, we introduce two types of achievability.

Definition 2.

For real numbers , we say is -weakly achievable in the sense of the average cost if and only if there exist a sequence of integers and a sequence of encoders such that

(1)
(2)

where denotes the mutual information between RVs and , and denotes the expectation.

Definition 3.

For real numbers , we say is -weakly achievable in the sense of the worst-case cost if and only if there exist a sequence of integers and a sequence of encoders such that

(3)

We adopt the mutual information normalized by the blocklength in these definitions (i.e., (2) and (3)). This is a somewhat weak criterion of independence compared with the mutual information itself (not normalized by the blocklength). The stronger version of this criterion will be considered in the later section.

Now, we define the minimum average and the minimum worst-case costs under the weak independence criterion.

Definition 4.

We define the minimum average cost as

Definition 5.

We define the minimum worst-case cost as

2.2 Minimum Average and Minimum Worst-Case Costs

In this section, we characterize the minimum average and the minimum worst-case costs. To this end, for given sequences of RVs, we define

and denote by

that the Markov chain

holds for all .

For the minimum costs under the weak independence criterion, we have the following two theorems.

Theorem 1.

For a pair of general sources and any real numbers , we have

Theorem 2.

For a pair of general sources and any real numbers , we have

Since proofs of theorems are rather long, we postpone these to Section 4. The only difference of two theorems is using a function or .

According to [11, Theorem 8 c), d), and e)], it holds that . Hence, the following two corollaries follow immediately.

Corollary 1.

When and , we have

Corollary 2.

When and , we have

Right-hand sides of Corollaries 1 and 2 can be regarded as the distortion-rate function for the variable-length coding under the average distortion criterion (see, e.g., [8, Remark 5.7.2]) and the maximum distortion criterion (see, e.g., the proof of [8, Theorem 5.6.1]), respectively. This fact allows us to apply many results of the distortion-rate function to our study. For example, according to the proof of [8, Theorem 5.8.1], the minimum costs for stationary memoryless sources are given by the next corollary.

Corollary 3.

Let and . Further, let be a stationary memoryless source induced by an RV on , and be an additive cost function defined by

where . Then, we have

We also consider a mixed source of two sources and defined by

where . According to [8, Remark 5.10.2], we have the next corollary.

Corollary 4.

Let and . For a subadditive cost function that satisfies

let and be the minimum average cost when . Then, for a mixed source of two stationary sources and , we have

3 Minimum Costs to Erase Information under the Strong Independence Criterion

In this section, we consider the minimum average cost under the strong independence criterion. In order to clarify the fundamental limit of average costs, we assume that an encoder is a stochastic encoder in this section. In other words, we consider the case where the size of the uniform random number is sufficiently large. We also assume that a source is a stationary memoryless source. Then, we give a single-letter characterization of the minimum average cost and some results obtained from this characterization.

3.1 Problem Formulation

In this section, we define minimum average cost under the strong independence criterion.

Let be the pair of stationary memoryless sources, i.e., be independent copies of a pair of RVs on . For the sake of brevity, we simply express the sources as . Let be a stochastic encoder, and be an additive cost function as defined in Corollary 3, i.e., , where is an arbitrary function.

The achievablility under the strong independence criterion is defined as follows.

Definition 6.

For real numbers , we say is -strongly achievable in the sense of the average cost if and only if there exists a sequence of stochastic encoders such that

(4)

where the expectation is with respect to the sequence and the output of the stochastic encoder .

The difference from the previous section is to use the strong independence criterion in (4).

The minimum average cost under the strong independence criterion is defined as follows.

Definition 7.

We define the minimum average cost as

Remark 1.

We only consider the average cost in this section. This is because the minimum worst-case cost coincides with the minimum average cost after all for stationary memoryless sources. This is similar to Corollary 3.

We also consider the non-asymptotic version of the achievablity defined as follows.

Definition 8.

For an integer , and real numbers , we say is -strongly achievable in the sense of the average cost if and only if there exists a stochastic encoder such that

(5)
Remark 2.

Definition 8 adopts the strong independence criterion in (5). However, this is not important in the non-asymptotic setting because this criterion is regarded as the weak criterion if we set as .

The non-asymptotic minimum average cost is defined as follows.

Definition 9.

We define the non-asymptotic minimum average cost for a given finite blocklength as

Remark 3.

When we employ a stochastic encoder, we can give a multi-letter characterization even for general cost functions and general sources as

However, since this characterization is quite obvious from these definitions, we focus on the single-letter characterization of basic stationary memoryless sources and additive cost functions in this paper.

3.2 Minimum Average Costs

In this section, we give a single-letter characterization of minimum average costs and . Since this characterization is given by employing usual information-theoretical techniques, this might not be of the main interest. However, results obtained from it are interesting and insightful.

First of all, we show a single-letter characterization of the non-asymptotic minimum average cost .

Theorem 3.

For a pair of stationary memoryless sources , any integer , and any real number , we have

Proof.

First, we show the converse part. If is -strongly achievable in the sense of the average cost, there exists such that

(6)

where . We note that

(7)

where the second equality comes from the fact that is independent of , i.e., . On the other hand, let be an RV on and be RVs on such that . Then, we have

(8)

where the first inequality comes from (7) and the last inequality comes from the fact that is independent of . Thus, from (6), we have

(9)

where the last inequality comes from (8) and the fact that . Since this inequality holds for any -strongly achievable , we have

Next, we show the direct part. Let be an RV on such that and

Then, the direct part is obvious, if we define the encoder as

For this encoder, we have

Thus, is -strongly achievable for any such that and . This implies that

Remark 4.

In the converse part, the single-letter characterization in the most right-hand sides of (8) and (9) are largely dependent on the assumption that sources are stationary memoryless and the cost function is additive.

Remark 5.

Since we do not use the finiteness of , , and , Theorem 3 holds even if these sets are countably infinite.

Next, we give a single-letter characterization of the minimum average cost which shows that it is impossible to reduce the minimum cost by allowing information leakage.

Theorem 4.

For a pair of stationary memoryless sources and any , we have

Proof.

If is -strongly achievable in the sense of the average cost, there exists such that for any and all sufficiently large ,

where . By noting that is arbitrary and is continuous at (see Appendix A), the rest of the proof can be done in the same way as the proof of Theorem 3. Hence, we omit the details. ∎

Remark 6.

The finiteness of sets and is necessary to show the continuity at in Appendix A.

According to Theorem 3 and Theorem 4, it holds that for any and ,

Hence, we only consider because is a special case of it.

As in the previous section, the next corollary follows immediately.

Corollary 5.

When , we have

(10)

According to this corollary and Corollary 3, when and is a stationary memoryless source, it holds that for any ,

Since the right-hand side of (10) is the distortion-rate function, we have some closed-form expressions of the minimum cost (see. e.g., [8] and [12]). For example, let , , and , where and denotes the indicator function. Then, we have

(11)

where , , and is the inverse function of .

Furthermore, according to Corollary 5, when , it holds that

where the first equality comes from the fact that and are independent. Interestingly, this can be achieved by a certain deterministic encoder as follows: Let and define an encoder as

Then, this encoder achieves , i.e., we have

(12)
(13)

This means that when , the optimal method of erasure is to overwrite the source sequence with repeated identical symbols using . We note that gives the minimum average cost among encoders using repeated identical symbols.

Next, we give a sufficient condition such that can be achieved by the encoder . Then, we show that the case where is a special case of the sufficient condition. To this end, we define the weak independence introduced by Berger and Yeung [13].

Definition 10 (Weak independence).

For a pair of RVs, let be the

th row of the stochastic matrix

. Then, we say is weakly independent of if the rows are linearly dependent.

Remark 7.

If is binary, then is weakly independent of if and only if and are independent [13, Remark 3].

The weak independence has a useful property for independence of a triple of RVs satisfying a Markov chain. This property is shown in the next lemma.

Lemma 1 ([13, Theorem 4]).

Let , , and be finite sets, and . Then, for a pair of RVs, there exists an RV satisfying

  1. and are independent

  2. and are not independent

if and only if is weakly independent of .

Now, we give a sufficient condition.

Theorem 5.

If is not weakly independent of , the optimal method of erasure is to overwrite the source sequence with repeated identical symbols using , i.e., it holds that

Proof.

Since we immediately obtain that and (see (12) and (13)), we only have to show that .

Since is not weakly independent of , there does not exist an RV simultaneously satisfying three conditions in Lemma 1. This implies that for any such that and , it must satisfy that . This is because if , simultaneously satisfies three conditions in Lemma 1.

Thus, we have

where (a) comes from the above argument and (b) follows since and are independent.

Since the opposite direction is obvious by setting with probability , this completes the proof. ∎

If , is not weakly independent of . Thus, this is a special case of this sufficient condition. According to Remark 7, we can also show that if is binary, the encoder is optimal as long as and are not independent.

On the other hand, if is weakly independent of , cannot be achieved by the repeated symbols using the encoder in general. To show this fact, we give an example such that . Let , , , for all , and

where the th row and the th column denotes the conditional probability . Then, we have . We note that is weakly independent of . On the other hand, we consider an RV such that , and

where the th row and the th column denotes the conditional probability . Then, one can easily check that is independent of , and