Zero knowledge proofs for cloud storage integrity checking

by   Faen Zhang, et al.

With the wide application of cloud storage, cloud security has become a crucial concern. Related works have addressed security issues such as data confidentiality and integrity, which ensure that the remotely stored data are well maintained by the cloud. However, how to define zero-knowledge proof algorithms for stored data integrity check has not been formally defined and investigated. We believe that it is important that the cloud server is unable to reveal any useful information about the stored data. In this paper, we introduce a novel definition of data privacy for integrity checks, which describes very high security of a zero-knowledge proof. We found that all other existing remote integrity proofs do not capture this feature. We provide a comprehensive study of data privacy and an integrity check algorithm that captures data integrity, confidentiality, privacy, and soundness.



There are no comments yet.


page 1

page 2

page 3

page 4


On the Security of A Remote Cloud Storage Integrity Checking Protocol

Data security and privacy is an important but challenging problem in clo...

Blockchain Enabled Privacy Preserving Data Audit

Data owners upload large files to cloud storage servers, but malicious s...

Synergia: Hardening High-Assurance Security Systems with Confidential and Trusted Computing

High-assurance security systems require strong isolation from the untrus...

Informal Data Transformation Considered Harmful

In this paper we take the common position that AI systems are limited mo...

Towards a Secure and Reliable IT-Ecosystem in Seaports

Digitalization in seaports dovetails the IT infrastructure of various ac...

Distributed Data Verification Protocols in Cloud Computing

Recently, storage of huge volume of data into Cloud has become an effect...

ZK-SecreC: a Domain-Specific Language for Zero Knowledge Proofs

We present ZK-SecreC, a domain-specific language for zero-knowledge proo...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Cloud computing offers different types of computational services to end users via computer networks, demonstrating a huge number of advantages. It has been becoming a trend that individuals and IT enterprises store data remotely on the cloud in a flexible on-demand manner, which has become a popular way of data outsourcing. This can greatly reduces the burden of storage management and maintenance and brings a great advantage of universal data access and convenience to users. In fact, cloud storage has become one of the main parts in cloud computing where user data are stored and maintained by cloud servers. It allows users to access their data via computer networks at anytime and from anywhere.

Despite the great benefits provided by cloud computing, data security is a very important but challenging problem that must be solved. One of the major concerns of data security is data integrity in a remote storage system [5, 1]. Although storing data on the cloud is attractive, it does not always offer any guarantee on data integrity and retrievability. Simple data integrity check in a remote data storage can be done by periodically examining the data files stored on the cloud server, but such an approach can be very expensive if the amount of data is huge. An interesting problem is to check data integrity remotely without the need of accessing the full copy of data stored on the cloud server. For example, the data owner possesses some verification token (e.g. a digest of the data file [6, 7]), which is very small compared with the stored dataset. However, a number of security issues have been found in previous research [16, 15, 17]. Several techniques, such as Proof of Retrievability (POR) [11, 9] and Third Party Auditing (TPA) [14, 17, 13], have been proposed to solve the above data integrity checking problem with public auditability. POR is loosely speaking a kind of Proof of Knowledge (POK) [3] where the knowledge is the data file, while TPA allows any third party (or auditor) to perform the data integrity checking on behalf of the data owner just based on some public information (e.g. the data owner’s public key). Several schemes with public auditability have been proposed in the context of ensuring remotely stored data integrity under different system and security models [9, 11, 2, 17].

Intuitively, it is important that an auditing process should not introduce new vulnerabilities of unauthorized information leakage towards the data security [12]. The previous efforts in Remote Integrity Checking (DIC) accommodate several security features including data integrity and confidentiality, which mainly ensure secure maintenance of data. However, they do not cover the issue of data privacy, which means that the communication flows (DIC proofs) from the cloud server should not reveal any useful information to the adversary. Intuitively, by “privacy”, we mean that an adversary should not be able to distinguish which file has been uploaded by the client to the cloud server. We refer it as Zero Knowledge. We believe that it is very important to consider such privacy issues adequately in protocol designs. Taking some existing TPA based DIC proofs [15, 13, 17] as an example, the proof sent by the cloud server to the auditor does not allow the auditor to recover the file, but the auditor can still distinguish which file (among a set of possible files) is involved in the DIC proof, which is clearly undesirable.

In this paper, we propose an Zero Knowledge-based definition of data privacy (DIC-Privacy) for TPA based DIC protocols. We show that two recently published DIC schemes [13, 17] are insecure under our new definition, which means some information about the user file is leaked in the DIC proof. We then provide an new construction to demonstrate how DIC-privacy can be achieved. We show that by applying the Witness Zero Knowledge proof technique [8], we are able to achieve DIC-privacy in DIC protocols. To the best of our knowledge, our construction is the first scheme that can achieve DIC-privacy.

Paper Organization. The rest of the paper is organized as follows. In Section 2, we describe the security model and definition of data privacy for DIC proofs. In Section 3, we analyze the DIC protocols by Wang et al. and show why their DIC protocols fail to capture data privacy. In Section 4, we demonstrate how data privacy can be achieved with a witness Zero Knowledge proof. We also provide the definition of soundness for DIC proofs and show the soundness of our protocol based on witness Zero Knowledge proof. We conclude the paper in Section 6.

2 Definitions and Security Model

DIC Protocols. We will focus on TPA based Data Integrity Checking (DIC) protocols for cloud data storage systems. The protocol involves three entities: the cloud storage server, the cloud user, and the third party auditor (TPA). The cloud user relies on the cloud storage server to store and maintains his/her data. Since the user no longer keeps the data locally, it is of critical importance for the user to ensure that the data are correctly stored and maintained by the cloud server. In order to avoid periodically data integrity verification, the user will resort to a TPA for checking the integrity of his/her outsourced data. To be precise, an DIC protocol for cloud storage consists of five algorithms:

  • KeyGen: Taking as input a security parameter , the algorithm KeyGen generates the public and private key pair of a cloud user (or data owner).

  • TokenGen: Taking as input a file and the user private key , this algorithm generates a file tag (which includes a file name ) and an authenticator for . The file and file tag, as well as the authenticator are then stored in the cloud server.

  • Challenge: Given the user public key and a file tag , this algorithm is run by the auditor to generate a random challenge for the cloud server.

  • Respond: Taking as input , this algorithm outputs a proof , which is used to prove the integrity of the file.

  • Verify: Taking as input , the algorithm outputs either True or False.

DIC Privacy. We define the data privacy for DIC proofs via an Zero Knowledge game between a simulator (i.e. the cloud server or prover) and an adversary (i.e. the auditor or verifier).

Setup: The simulator runs KeyGen to generate and passes to the adversary .

Phase 1: is allowed to make Token Generation queries. To make such a query, selects a file and sends it to . generates a file tag , an authenticator , and then returns to .

Phase 2: chooses two different files that have not appeared in Phase 1, and send them to . calculates and by running the TokenGen algorithm. then tosses a coin , and sends back to . generates a challenge and sends it to . generates a proof based on and ’s challenge and then sends to . Finally, outputs a bit as the guess of . The process is illustrated in Figure 1.

Define the advantage of the adversary as

Definition 1

An DIC proof has Zero Knowledge if for any polynomial-time algorithm, is a negligible function of the security parameter .

Figure 1: Zero knowledge Proof game run between and

3 Privacy Analysis of Existing DIC Protocols

3.1 Notations and Preliminaries

Before describing some existing DIC protocols, we first introduce some notations and tools used in those protocols. We denote the data file to be stored in the cloud. It is decomposed as a sequence of blocks for some large prime . We denote by and cryptographic hash functions.

Let and be multiplicative cyclic groups of prime order . Let and be generators of and , respectively. A bilinear map is a map such that for all , and , . Also, the map must be efficiently computable and non-degenerate (i.e. ). In addition, let denote an efficiently computable isomorphism from to , with [4].

3.2 A DIC Protocol by Wang et al. [17]

In [17], Wang et al. presented a DIC protocol based on Merkle Hash Tree (MHT) [10]. Their protocol works as follows.

Setup Phase: The cloud user generates the keys and authentication tokens for the files as follows.

KeyGen: The cloud user runs KeyGen to generate the public and private key pair. Specifically, the user generates a random verification and signing key pair of a digital signature scheme, and set the public key and where is randomly chosen from and .

TokenGen: Given a file , the client chooses a file name , a random element and calculates the file tag

and authenticators where is a cryptographic hash function modeled as a random oracle. The client then generates a root based on the construction of Merkle Hash Tree (MHT) where the leave nodes of the tree are an ordered set of hash values . The client then signs the root under the private key : and sends to the cloud server.

Audit Phase: The TPA first retrieves the file tag and verifies the signature by using . The TPA then obtains and .

Challenge: To generate , TPA picks a random subset of set , where . Then, the TPA sends a challenge to the cloud server where is randomly selected from .

Response: Upon receiving the challenge , the cloud server computes and . The cloud server will also provide the verifier with a small amount of auxiliary information , which are the node siblings on the path from the leaves to the root of the MHT. The server sends the proof to the TPA.

Verify: Upon receiving the responses form the cloud server, the TPA generates the root using , and authenticates it by checking

If the authentication fails, the verifier rejects by emitting FALSE. Otherwise, the verifier checks

If the equation holds, output True; otherwise, output False.

3.2.1 Zero knowledge Proof Analysis

It is easy to see that the above DIC protocol does not provide DIC-Privacy. Let denote an Zero Knowledge Proof adversary which works as follows (also see Fig. 2).

  • chooses distinct files and where .

  • chooses at random a file for and then computes .

  • chooses a random challenge .

  • computes and sends to the response

  • chooses and calculates and compare it with the received . If they are equal, output 0; otherwise, output 1.

Probability Analysis. It is easy to see that

has an overwhelming probability to guess the value of

correctly since the probability that

is negligible since the hash function is assumed to be a random oracle in [17].

Figure 2: Zero Knowledge analysis on Wang et al.’s DIC Protocol [17].

3.3 Another Privacy Preserving DIC Protocol by Wang et al. [13]

In [13], Wang et al. introduced a new DIC protocol. Compared with the DIC protocol presented above, this new protocol aims to achieve the additional property of privacy preserving (i.e. the TPA cannot learn the content of the file in the auditing process).

Figure 3: The third party auditing protocol by Wang et al. [13].

Let be the system parameters as introduced above. Wang et al.’s privacy-preserving public auditing scheme works as follows (also see Fig. 3):

Setup Phase:

KeyGen: The cloud user runs KeyGen to generate the public and private key pair. Specifically, the user generates a random verification and signing key pair of a digital signature scheme, a random , a random element , and computes . The user secret key is and the user public key is .

TokenGen: Given a data file , the user first chooses uniformly at random from a unique identifier for . The user then computes authenticator for each data block as where . Denote the set of authenticators by . Then the user computes as the file tag for , where is the user’s signature on under the signing key . It was assumed that the TPA knows the number of blocks . The user then sends along with the verification metadata to the cloud server and deletes them from local storage.

Audit Phase: The TPA first retrieves the file tag and verifies the signature by using . The TPA quits by emitting if the verification fails. Otherwise, the TPA recovers .

Challenge: The TPA generates a challenge for the cloud server as follows: first picks a random -element subset of set , and then for each element , chooses a random value . The TPA sends to the cloud server.

Response: Upon receiving the challenge , the server generates a response to prove the data storage correctness. Specifically, the server chooses a random element , and calculates . Let denote the linear combination of sampled blocks specified in : . To blind with , the server computes , where . Meanwhile, the server also calculates an aggregated authenticator . It then sends as the response to the TPA.

Verify: Upon receiving the response from the cloud server, the TPA validates the response by first computing and then checking the following verification equation


The verification is successful if the equation holds.

Figure 4: Zero Knowledge analysis on Wang et al. DIC Protocol [13].

3.3.1 Zero Knowledge Analysis

In [13], it has been shown that the DIC proof is privacy preserving. That is, the TPA cannot recover the file from the proof. This is done by concealing the value of . However, we found that such a treatment could not guarantee that there is no information leakage during the auditing process. Below we show that Wang et al.’s scheme cannot achieve Zero Knowledge. Let denote an Zero Knowledge Proof adversary which works as follows (also see Fig. 4).

  • chooses two distinct files and such that for .

  • randomly chooses a file for and computes the file tag and authenticators .

  • After receiving the tag , chooses a random challenge .

  • computes and sends to the response .

  • computes and checks if

    If it is true, return 0; otherwise, return 1.

Probability Analysis. If , then and the equation

always holds. On the other hand, if , then and

holds only when

which happens only with probability for randomly selected . Therefore, has an overwhelming probability to guess the value of correctly.

4 A New DIC Protocol with DIC-Privacy

In order to achieve the DIC-privacy, we adopt the Witness Zero Knowledge Proof of Knowledge technique proposed by Groth and Sahai [8]. Their method can be applied to pairing groups. Our goal is to protect both the file and the corresponding authenticator so that the adversary cannot learn any information about the file.

Similar to Wang et al.’s scheme [13] reviewed in Section 3.3, our scheme is still based on the “aggregate authenticator” introduced by Shacham and Waters [11]. That is, the cloud server will prove that the equation


holds, where and . We will treat as the witness when applying the Groth-Sahai proof system, and rewrite Equation 2 as follows


In order to protect the privacy of (or ) and , the user computes an additional commitment key of the form

where are selected from at random and is the same generator of used in Wang et al.’s scheme. This additional commitment key is now part of the user public key. To hide and , the Cloud Server computes the commitments as

where are randomly selected from . The Cloud Server also computes

and sends () as the response to the TPA.

TPA then verifies the response sent by the Cloud Server by checking the equality of


where represents the right hand side of Equation (3) and denotes the following transformation:

The “” operation is defined as follows: define a function

for and , and the “” operation is defined as

Correctness. To verify Equation (4),

and we have

4.1 DIC-Privacy of Our New Scheme

Below we show that our new DIC protocol has the DIC-Privacy under the symmetDIC external Diffie-Hellman (SXDH) assumption [8]. Let define a bilinear map where is a generator of for . The SXDH assumption holds if for any polynomial time algorithm and any we have

where is negligible in the security parameter .

Theorem 1

Our new DIC protocol has DIC-Privacy if the SXDH problem is hard.

Proof 1

Let denote an adversary who has a non-negligible advantage in winning the Zero Knowledge Proof game, we construct another algorithm which can solve the SXDH problem also with a non-negligible probability.

receives a challenge where and is either or a random element in . sets up the Zero Knowledge Proof game for as follows

  1. uses the information in to generate all the systems parameters and public/private keys as described in Wang et al.’s TPA scheme (Sec. 3.3).

  2. also sets the values of the commitment key in our scheme as and .

Upon receiving the two files and from , simulates the game as follows. generates a random file identifier and the file tag , and uses and the secret key to compute the authenticators (for ) and (for ) honestly. After that, sends the file tag back to . Upon receiving the challenge from , computes , , and the corresponding aggregated authenticators and honestly. then tosses a random coin , and generates the response to as follows.

  1. Randomly choose from .

  2. Compute .

  3. Compute .

then sends the response to . If outputs such that , then outputs 1; otherwise outputs .

Case 1: . In this case, the distribution of the response is identically to that of a real response, and hence we have

Case 2: . In this case, the commitment scheme is perfectly hiding. That is, for a valid proof satisfying equation 4, it can be expressed as a proof for (with randomness ), or a proof for (with randomness ). Therefore, we have

Combining both cases, we have

4.2 Soundness of the Protocol

Having shown the Zero Knowledge Proof feature of the protocol, we have seen that adversary cannot distinguish the file that has been used by the cloud server in an DIC proof. The remanning task is to prove the “soundness” of the protocol. We say a protocol is sound if it is infeasible for the cloud server to change a file without being caught by the TPA in an auditing process. We formally define the soundness games between a simulator and an adversary (i.e. the cloud server) as follows.

  • Key Generation. generates a user key pair by running KeyGen, and then provides to .

  • Phase 1. can now interact with and make at most Token Generation queries. In each query, sends a file to , which responds with the corresponding file tag and authentication tokens .

  • Phase 2. outputs a file and a file tag such that but for an (i.e. at least one message block of has been modified by ). then plays the role as the verifier and executes the DIC protocol with by sending a challenge which contains at least one index such that differs from in the -th message block.

  • Decision. Based on the proof computed by , makes a decision which is either True or False.

Definition 2

We say a witness Zero Knowledge Proof DIC protocol is -sound if

Below we prove that our DIC protocol is sound under the co-CDH assumption. Let be the systems parameters defined as above where is a bilinear map. Let denote an efficiently computable isomorphism such that .

Computational co-Diffie-Hellman (co-CDH) Problem on : Given and as input where and are generators of and respectively, is randomly chosen from , and is randomly chosen from , compute .

Theorem 2

The proposed witness Zero Knowledge Proof DIC protocol is -sound, where is a negligible function of the security parameter , if the co-CDH problem is hard.

Proof 2

Our proof is by contradiction. We show that if there exists an adversary that can win the soundness game with a non-negligible probability, then we can construct another adversary which can solve the co-CDH problem also with a non-negligible probability.

According to the soundness game, must be different from the original file associated with (or ). That means there must exist an such that . Below we show that if can pass the verification for where and at lease one of is modified by , then can solve the co-CDH problem.

is given an instance of the co-CDH problem where and are generators of and respectively such that , and is a random element in . ’s goal is to compute . honestly generates the signing key pair , and the commitments key according to the protocol specification. also sets as value of in the user public key, but the value of is unknown to . then simulates the game as follows.

Phase 1: answers ’s queries in Phase 1 as follows. To generate a file tag for a file , first chooses at random and generates the file tag . For each block in , chooses at random and programs the random oracle

then computes

It is easy to verify that is a valid authenticator with regards to .

Phase 2: Suppose outputs a response for and challenges where at least one has been modified by the adversary. Denote .

Let and denote the original file and authenticator that satisfy


then uses the value of , which is used to generate the commitment key , to obtain and from the commitment . Since can pass the verification, from Equation 4 we have


From Equation 5 and Equation 6, we can obtain

Since chooses the challenges randomly, with overwhelming probability , , and hence can obtain

5 Conclusion

In this paper, we studied a new desirable security notion called DIC-Privacy for remote data integrity checking protocols for cloud storage. We showed that several well-known DIC protocols cannot provide this property, which could render the privacy of user data exposed in an auditing process. We then proposed a new DIC protocol which can provide DIC-Privacy. Our construction is based on an efficient Witness Zero Knowledge Proof of Knowledge system. In addition, we also proved the soundness of the newly proposed protocol, which means the cloud server cannot modify the user data without being caught by the third party auditor in an auditing process.


  • [1] M. Arrington (2006) Gmail disaster: reports of mass email deletions. Cited by: §1.
  • [2] G. Ateniese, R. C. Burns, R. Curtmola, J. Herring, O. Khan, L. Kissner, Z. N. J. Peterson, and D. Song (2011) Remote data checking using provable data possession. ACM Trans. Inf. Syst. Secur. 14, pp. 1–34. Cited by: §1.
  • [3] M. Bellare and O. Goldreich (1992) On defining proofs of knowledge. In Advances in Cryptology, Proc. CRYPTO 92, LNCS 740, pp. 390–420. Cited by: §1.
  • [4] D. Boneh, B. Lynn, and H. Shacham (2004) Short signatures from the weil pairing. J. Cryptology 17 (4), pp. 297–319. Cited by: §3.1.
  • [5] Cloud Security Alliance (2010) Top threats to cloud computing. Note: Cited by: §1.
  • [6] Y. Deswarte, J. J. Quisquater, and A. Saidane (2004) Remote integrity checking. In Integrity and Internal Control in Information Systems VI, S. Jajodia and L. Strous (Eds.), IFIP International Federation for Information Processing, Vol. 140, pp. 1–11. Cited by: §1.
  • [7] D. L. G. Filho and P. S. L. M. Barreto (2006) Demonstrating data possession and uncheatable data transfer. IACR Cryptology ePrint Archive, pp. 150–159. Cited by: §1.
  • [8] J. Groth and A. Sahai (2008) Efficient non-interactive proof systems for bilinear groups. In Advances in Cryptology, Proc. EUROCRYPT 2008, LNCS 4965, pp. 415–432. Cited by: §1, §4.1, §4.
  • [9] A. Juels and B. S. K. Jr. (2007) Pors: proofs of retrievability for large files. In ACM Conference on Computer and Communications Security, pp. 584–597. Cited by: §1.
  • [10] R. C. Merkle (1980) Protocols for public key cryptosystems. In IEEE Symposium on Security and Privacy, pp. 122–134. Cited by: §3.2.
  • [11] H. Shacham and B. Waters (2008) Compact proofs of retrievability. In Advances in Cryptology - ASIACRYPT, pp. 90–107. Cited by: §1, §4.
  • [12] M. A. Shah, M. Baker, J. C. Mogul, and R. Swaminathan (2007) Auditing to keep online storage services honest. In Proc. of HotOSÕ07, pp. 1–6. Cited by: §1.
  • [13] C. Wang, S. S.M. Chow, Q. Wang, K. Ren, and W. Lou Privacy-preserving public auditing for secure cloud storage. IEEE Transactions on Computers. Note: Accepted for publication, doi: 10.1109/TC.2011.245 Cited by: §1, §1, §1, Figure 3, Figure 4, §3.3, §3.3.1, §3.3, §4.
  • [14] C. Wang, K. Ren, W. Lou, and J. Li (2010) Toward publicly auditable secure cloud data storage services. IEEE Network 24 (4), pp. 19–24. Cited by: §1.
  • [15] C. Wang, Q. Wang, K. Ren, and W. Lou (2010) Privacy-preserving public auditing for data storage security in cloud computing. In IEEE INFOCOM, pp. 525–533. Cited by: §1, §1.
  • [16] Q. Wang, C. Wang, J. Li, K. Ren, and W. Lou (2009) Enabling public verifiability and data dynamics for storage security in cloud computing. In ESORICS, pp. 355–370. Cited by: §1.
  • [17] Q. Wang, C. Wang, K. Ren, W. Lou, and J. Li (2011) Enabling public auditability and data dynamics for storage security in cloud computing. IEEE Trans. Parallel Distrib. Syst. 22 (5), pp. 847–859. Cited by: §1, §1, §1, Figure 2, §3.2, §3.2.1, §3.2.