Cloud computing offers different types of computational services to end users via computer networks, demonstrating a huge number of advantages. It has been becoming a trend that individuals and IT enterprises store data remotely on the cloud in a flexible on-demand manner, which has become a popular way of data outsourcing. This can greatly reduces the burden of storage management and maintenance and brings a great advantage of universal data access and convenience to users. In fact, cloud storage has become one of the main parts in cloud computing where user data are stored and maintained by cloud servers. It allows users to access their data via computer networks at anytime and from anywhere.
Despite the great benefits provided by cloud computing, data security is a very important but challenging problem that must be solved. One of the major concerns of data security is data integrity in a remote storage system [5, 1]. Although storing data on the cloud is attractive, it does not always offer any guarantee on data integrity and retrievability. Simple data integrity check in a remote data storage can be done by periodically examining the data files stored on the cloud server, but such an approach can be very expensive if the amount of data is huge. An interesting problem is to check data integrity remotely without the need of accessing the full copy of data stored on the cloud server. For example, the data owner possesses some verification token (e.g. a digest of the data file [6, 7]), which is very small compared with the stored dataset. However, a number of security issues have been found in previous research [16, 15, 17]. Several techniques, such as Proof of Retrievability (POR) [11, 9] and Third Party Auditing (TPA) [14, 17, 13], have been proposed to solve the above data integrity checking problem with public auditability. POR is loosely speaking a kind of Proof of Knowledge (POK)  where the knowledge is the data file, while TPA allows any third party (or auditor) to perform the data integrity checking on behalf of the data owner just based on some public information (e.g. the data owner’s public key). Several schemes with public auditability have been proposed in the context of ensuring remotely stored data integrity under different system and security models [9, 11, 2, 17].
Intuitively, it is important that an auditing process should not introduce new vulnerabilities of unauthorized information leakage towards the data security . The previous efforts in Remote Integrity Checking (DIC) accommodate several security features including data integrity and confidentiality, which mainly ensure secure maintenance of data. However, they do not cover the issue of data privacy, which means that the communication flows (DIC proofs) from the cloud server should not reveal any useful information to the adversary. Intuitively, by “privacy”, we mean that an adversary should not be able to distinguish which file has been uploaded by the client to the cloud server. We refer it as Zero Knowledge. We believe that it is very important to consider such privacy issues adequately in protocol designs. Taking some existing TPA based DIC proofs [15, 13, 17] as an example, the proof sent by the cloud server to the auditor does not allow the auditor to recover the file, but the auditor can still distinguish which file (among a set of possible files) is involved in the DIC proof, which is clearly undesirable.
In this paper, we propose an Zero Knowledge-based definition of data privacy (DIC-Privacy) for TPA based DIC protocols. We show that two recently published DIC schemes [13, 17] are insecure under our new definition, which means some information about the user file is leaked in the DIC proof. We then provide an new construction to demonstrate how DIC-privacy can be achieved. We show that by applying the Witness Zero Knowledge proof technique , we are able to achieve DIC-privacy in DIC protocols. To the best of our knowledge, our construction is the first scheme that can achieve DIC-privacy.
Paper Organization. The rest of the paper is organized as follows. In Section 2, we describe the security model and definition of data privacy for DIC proofs. In Section 3, we analyze the DIC protocols by Wang et al. and show why their DIC protocols fail to capture data privacy. In Section 4, we demonstrate how data privacy can be achieved with a witness Zero Knowledge proof. We also provide the definition of soundness for DIC proofs and show the soundness of our protocol based on witness Zero Knowledge proof. We conclude the paper in Section 6.
2 Definitions and Security Model
DIC Protocols. We will focus on TPA based Data Integrity Checking (DIC) protocols for cloud data storage systems. The protocol involves three entities: the cloud storage server, the cloud user, and the third party auditor (TPA). The cloud user relies on the cloud storage server to store and maintains his/her data. Since the user no longer keeps the data locally, it is of critical importance for the user to ensure that the data are correctly stored and maintained by the cloud server. In order to avoid periodically data integrity verification, the user will resort to a TPA for checking the integrity of his/her outsourced data. To be precise, an DIC protocol for cloud storage consists of five algorithms:
KeyGen: Taking as input a security parameter , the algorithm KeyGen generates the public and private key pair of a cloud user (or data owner).
TokenGen: Taking as input a file and the user private key , this algorithm generates a file tag (which includes a file name ) and an authenticator for . The file and file tag, as well as the authenticator are then stored in the cloud server.
Challenge: Given the user public key and a file tag , this algorithm is run by the auditor to generate a random challenge for the cloud server.
Respond: Taking as input , this algorithm outputs a proof , which is used to prove the integrity of the file.
Verify: Taking as input , the algorithm outputs either True or False.
DIC Privacy. We define the data privacy for DIC proofs via an Zero Knowledge game between a simulator (i.e. the cloud server or prover) and an adversary (i.e. the auditor or verifier).
Setup: The simulator runs KeyGen to generate and passes to the adversary .
Phase 1: is allowed to make Token Generation queries. To make such a query, selects a file and sends it to . generates a file tag , an authenticator , and then returns to .
Phase 2: chooses two different files that have not appeared in Phase 1, and send them to . calculates and by running the TokenGen algorithm. then tosses a coin , and sends back to . generates a challenge and sends it to . generates a proof based on and ’s challenge and then sends to . Finally, outputs a bit as the guess of . The process is illustrated in Figure 1.
Define the advantage of the adversary as
An DIC proof has Zero Knowledge if for any polynomial-time algorithm, is a negligible function of the security parameter .
3 Privacy Analysis of Existing DIC Protocols
3.1 Notations and Preliminaries
Before describing some existing DIC protocols, we first introduce some notations and tools used in those protocols. We denote the data file to be stored in the cloud. It is decomposed as a sequence of blocks for some large prime . We denote by and cryptographic hash functions.
Let and be multiplicative cyclic groups of prime order . Let and be generators of and , respectively. A bilinear map is a map such that for all , and , . Also, the map must be efficiently computable and non-degenerate (i.e. ). In addition, let denote an efficiently computable isomorphism from to , with .
3.2 A DIC Protocol by Wang et al. 
Setup Phase: The cloud user generates the keys and authentication tokens for the files as follows.
KeyGen: The cloud user runs KeyGen to generate the public and private key pair. Specifically, the user generates a random verification and signing key pair of a digital signature scheme, and set the public key and where is randomly chosen from and .
TokenGen: Given a file , the client chooses a file name , a random element and calculates the file tag
and authenticators where is a cryptographic hash function modeled as a random oracle. The client then generates a root based on the construction of Merkle Hash Tree (MHT) where the leave nodes of the tree are an ordered set of hash values . The client then signs the root under the private key : and sends to the cloud server.
Audit Phase: The TPA first retrieves the file tag and verifies the signature by using . The TPA then obtains and .
Challenge: To generate , TPA picks a random subset of set , where . Then, the TPA sends a challenge to the cloud server where is randomly selected from .
Response: Upon receiving the challenge , the cloud server computes and . The cloud server will also provide the verifier with a small amount of auxiliary information , which are the node siblings on the path from the leaves to the root of the MHT. The server sends the proof to the TPA.
Verify: Upon receiving the responses form the cloud server, the TPA generates the root using , and authenticates it by checking
If the authentication fails, the verifier rejects by emitting FALSE. Otherwise, the verifier checks
If the equation holds, output True; otherwise, output False.
3.2.1 Zero knowledge Proof Analysis
It is easy to see that the above DIC protocol does not provide DIC-Privacy. Let denote an Zero Knowledge Proof adversary which works as follows (also see Fig. 2).
chooses distinct files and where .
chooses at random a file for and then computes .
chooses a random challenge .
computes and sends to the response
chooses and calculates and compare it with the received . If they are equal, output 0; otherwise, output 1.
3.3 Another Privacy Preserving DIC Protocol by Wang et al. 
In , Wang et al. introduced a new DIC protocol. Compared with the DIC protocol presented above, this new protocol aims to achieve the additional property of privacy preserving (i.e. the TPA cannot learn the content of the file in the auditing process).
Let be the system parameters as introduced above. Wang et al.’s privacy-preserving public auditing scheme works as follows (also see Fig. 3):
KeyGen: The cloud user runs KeyGen to generate the public and private key pair. Specifically, the user generates a random verification and signing key pair of a digital signature scheme, a random , a random element , and computes . The user secret key is and the user public key is .
TokenGen: Given a data file , the user first chooses uniformly at random from a unique identifier for . The user then computes authenticator for each data block as where . Denote the set of authenticators by . Then the user computes as the file tag for , where is the user’s signature on under the signing key . It was assumed that the TPA knows the number of blocks . The user then sends along with the verification metadata to the cloud server and deletes them from local storage.
Audit Phase: The TPA first retrieves the file tag and verifies the signature by using . The TPA quits by emitting if the verification fails. Otherwise, the TPA recovers .
Challenge: The TPA generates a challenge for the cloud server as follows: first picks a random -element subset of set , and then for each element , chooses a random value . The TPA sends to the cloud server.
Response: Upon receiving the challenge , the server generates a response to prove the data storage correctness. Specifically, the server chooses a random element , and calculates . Let denote the linear combination of sampled blocks specified in : . To blind with , the server computes , where . Meanwhile, the server also calculates an aggregated authenticator . It then sends as the response to the TPA.
Verify: Upon receiving the response from the cloud server, the TPA validates the response by first computing and then checking the following verification equation
The verification is successful if the equation holds.
3.3.1 Zero Knowledge Analysis
In , it has been shown that the DIC proof is privacy preserving. That is, the TPA cannot recover the file from the proof. This is done by concealing the value of . However, we found that such a treatment could not guarantee that there is no information leakage during the auditing process. Below we show that Wang et al.’s scheme cannot achieve Zero Knowledge. Let denote an Zero Knowledge Proof adversary which works as follows (also see Fig. 4).
chooses two distinct files and such that for .
randomly chooses a file for and computes the file tag and authenticators .
After receiving the tag , chooses a random challenge .
computes and sends to the response .
computes and checks if
If it is true, return 0; otherwise, return 1.
Probability Analysis. If , then and the equation
always holds. On the other hand, if , then and
holds only when
which happens only with probability for randomly selected . Therefore, has an overwhelming probability to guess the value of correctly.
4 A New DIC Protocol with DIC-Privacy
In order to achieve the DIC-privacy, we adopt the Witness Zero Knowledge Proof of Knowledge technique proposed by Groth and Sahai . Their method can be applied to pairing groups. Our goal is to protect both the file and the corresponding authenticator so that the adversary cannot learn any information about the file.
Similar to Wang et al.’s scheme  reviewed in Section 3.3, our scheme is still based on the “aggregate authenticator” introduced by Shacham and Waters . That is, the cloud server will prove that the equation
holds, where and . We will treat as the witness when applying the Groth-Sahai proof system, and rewrite Equation 2 as follows
In order to protect the privacy of (or ) and , the user computes an additional commitment key of the form
where are selected from at random and is the same generator of used in Wang et al.’s scheme. This additional commitment key is now part of the user public key. To hide and , the Cloud Server computes the commitments as
where are randomly selected from . The Cloud Server also computes
and sends () as the response to the TPA.
TPA then verifies the response sent by the Cloud Server by checking the equality of
where represents the right hand side of Equation (3) and denotes the following transformation:
The “” operation is defined as follows: define a function
for and , and the “” operation is defined as
Correctness. To verify Equation (4),
and we have
4.1 DIC-Privacy of Our New Scheme
Below we show that our new DIC protocol has the DIC-Privacy under the symmetDIC external Diffie-Hellman (SXDH) assumption . Let define a bilinear map where is a generator of for . The SXDH assumption holds if for any polynomial time algorithm and any we have
where is negligible in the security parameter .
Our new DIC protocol has DIC-Privacy if the SXDH problem is hard.
Let denote an adversary who has a non-negligible advantage in winning the Zero Knowledge Proof game, we construct another algorithm which can solve the SXDH problem also with a non-negligible probability.
receives a challenge where and is either or a random element in . sets up the Zero Knowledge Proof game for as follows
uses the information in to generate all the systems parameters and public/private keys as described in Wang et al.’s TPA scheme (Sec. 3.3).
also sets the values of the commitment key in our scheme as and .
Upon receiving the two files and from , simulates the game as follows. generates a random file identifier and the file tag , and uses and the secret key to compute the authenticators (for ) and (for ) honestly. After that, sends the file tag back to . Upon receiving the challenge from , computes , , and the corresponding aggregated authenticators and honestly. then tosses a random coin , and generates the response to as follows.
Randomly choose from .
then sends the response to . If outputs such that , then outputs 1; otherwise outputs .
Case 1: . In this case, the distribution of the response is identically to that of a real response, and hence we have
Case 2: . In this case, the commitment scheme is perfectly hiding. That is, for a valid proof satisfying equation 4, it can be expressed as a proof for (with randomness ), or a proof for (with randomness ). Therefore, we have
Combining both cases, we have
4.2 Soundness of the Protocol
Having shown the Zero Knowledge Proof feature of the protocol, we have seen that adversary cannot distinguish the file that has been used by the cloud server in an DIC proof. The remanning task is to prove the “soundness” of the protocol. We say a protocol is sound if it is infeasible for the cloud server to change a file without being caught by the TPA in an auditing process. We formally define the soundness games between a simulator and an adversary (i.e. the cloud server) as follows.
Key Generation. generates a user key pair by running KeyGen, and then provides to .
Phase 1. can now interact with and make at most Token Generation queries. In each query, sends a file to , which responds with the corresponding file tag and authentication tokens .
Phase 2. outputs a file and a file tag such that but for an (i.e. at least one message block of has been modified by ). then plays the role as the verifier and executes the DIC protocol with by sending a challenge which contains at least one index such that differs from in the -th message block.
Decision. Based on the proof computed by , makes a decision which is either True or False.
We say a witness Zero Knowledge Proof DIC protocol is -sound if
Below we prove that our DIC protocol is sound under the co-CDH assumption. Let be the systems parameters defined as above where is a bilinear map. Let denote an efficiently computable isomorphism such that .
Computational co-Diffie-Hellman (co-CDH) Problem on : Given and as input where and are generators of and respectively, is randomly chosen from , and is randomly chosen from , compute .
The proposed witness Zero Knowledge Proof DIC protocol is -sound, where is a negligible function of the security parameter , if the co-CDH problem is hard.
Our proof is by contradiction. We show that if there exists an adversary that can win the soundness game with a non-negligible probability, then we can construct another adversary which can solve the co-CDH problem also with a non-negligible probability.
According to the soundness game, must be different from the original file associated with (or ). That means there must exist an such that . Below we show that if can pass the verification for where and at lease one of is modified by , then can solve the co-CDH problem.
is given an instance of the co-CDH problem where and are generators of and respectively such that , and is a random element in . ’s goal is to compute . honestly generates the signing key pair , and the commitments key according to the protocol specification. also sets as value of in the user public key, but the value of is unknown to . then simulates the game as follows.
Phase 1: answers ’s queries in Phase 1 as follows. To generate a file tag for a file , first chooses at random and generates the file tag . For each block in , chooses at random and programs the random oracle
It is easy to verify that is a valid authenticator with regards to .
Phase 2: Suppose outputs a response for and challenges where at least one has been modified by the adversary. Denote .
Let and denote the original file and authenticator that satisfy
then uses the value of , which is used to generate the commitment key , to obtain and from the commitment . Since can pass the verification, from Equation 4 we have
From Equation 5 and Equation 6, we can obtain
Since chooses the challenges randomly, with overwhelming probability , , and hence can obtain
In this paper, we studied a new desirable security notion called DIC-Privacy for remote data integrity checking protocols for cloud storage. We showed that several well-known DIC protocols cannot provide this property, which could render the privacy of user data exposed in an auditing process. We then proposed a new DIC protocol which can provide DIC-Privacy. Our construction is based on an efficient Witness Zero Knowledge Proof of Knowledge system. In addition, we also proved the soundness of the newly proposed protocol, which means the cloud server cannot modify the user data without being caught by the third party auditor in an auditing process.
-  (2006) Gmail disaster: reports of mass email deletions. http://www.techcrunch.com/2006/12/28/gmail-disasterreports-of-mass-email-deletions/. Cited by: §1.
-  (2011) Remote data checking using provable data possession. ACM Trans. Inf. Syst. Secur. 14, pp. 1–34. Cited by: §1.
-  (1992) On defining proofs of knowledge. In Advances in Cryptology, Proc. CRYPTO 92, LNCS 740, pp. 390–420. Cited by: §1.
-  (2004) Short signatures from the weil pairing. J. Cryptology 17 (4), pp. 297–319. Cited by: §3.1.
-  (2010) Top threats to cloud computing. Note: http://www.cloudsecurityalliance.org Cited by: §1.
-  (2004) Remote integrity checking. In Integrity and Internal Control in Information Systems VI, S. Jajodia and L. Strous (Eds.), IFIP International Federation for Information Processing, Vol. 140, pp. 1–11. Cited by: §1.
-  (2006) Demonstrating data possession and uncheatable data transfer. IACR Cryptology ePrint Archive, pp. 150–159. Cited by: §1.
-  (2008) Efficient non-interactive proof systems for bilinear groups. In Advances in Cryptology, Proc. EUROCRYPT 2008, LNCS 4965, pp. 415–432. Cited by: §1, §4.1, §4.
-  (2007) Pors: proofs of retrievability for large files. In ACM Conference on Computer and Communications Security, pp. 584–597. Cited by: §1.
-  (1980) Protocols for public key cryptosystems. In IEEE Symposium on Security and Privacy, pp. 122–134. Cited by: §3.2.
-  (2008) Compact proofs of retrievability. In Advances in Cryptology - ASIACRYPT, pp. 90–107. Cited by: §1, §4.
-  (2007) Auditing to keep online storage services honest. In Proc. of HotOSÕ07, pp. 1–6. Cited by: §1.
-  Privacy-preserving public auditing for secure cloud storage. IEEE Transactions on Computers. Note: Accepted for publication, doi: 10.1109/TC.2011.245 Cited by: §1, §1, §1, Figure 3, Figure 4, §3.3, §3.3.1, §3.3, §4.
-  (2010) Toward publicly auditable secure cloud data storage services. IEEE Network 24 (4), pp. 19–24. Cited by: §1.
-  (2010) Privacy-preserving public auditing for data storage security in cloud computing. In IEEE INFOCOM, pp. 525–533. Cited by: §1, §1.
-  (2009) Enabling public verifiability and data dynamics for storage security in cloud computing. In ESORICS, pp. 355–370. Cited by: §1.
-  (2011) Enabling public auditability and data dynamics for storage security in cloud computing. IEEE Trans. Parallel Distrib. Syst. 22 (5), pp. 847–859. Cited by: §1, §1, §1, Figure 2, §3.2, §3.2.1, §3.2.