A public blockchain or ledger consists of a set of blocks that are linked together, where each block contains a set of transactions. A public blockchain is maintained by a group of users, who run a consensus protocol (e.g., proof-of-work with longest-chain) to resolve disagreements regarding the blockchain. In a simple realization of public blockchain, each user keeps a local copy of the entire blockchain, meaning that each user has access to all historic activities and can easily test whether a new transaction is consistent with the existing transactions. This explains why a public ledger does not have to rely on any centralized party. This technique is central to many popular applications, such as Bitcoin .
Although keeping a local copy of the blockchain in question simplifies many operations (e.g., transaction searching and balance calculation), this imposes a substantial storage overhead because the blockchain keeps growing. For example, the Bitcoin blockchain includes 472,483 blocks in June 2017, or 120 GB in volume. This overhead may not be a problem for modern servers and PCs, but are prohibitive for lightweight users such as mobile devices and IoT devices. In general, this would hinder the development of applications that aim are meant to be built on top of blockchains (e.g., smart contract system ). At the same time, smart phones are the major way to get online in some areas, especially in underdeveloped countries, and there is a big need for mobile and lightweight users to use blockchains . Therefore, it is urgent to reduce the storage overhead, especially for those lightweight users.
Indeed, Nakamoto proposes the simplified payment verification (SPV) protocol in the very first Bitcoin paper , which requires a client to store some, instead of all, blocks while being able to check the validity of transactions recorded in the blockchain. This technique is also widely used in many blockchain-based applications, such as smart contract system . The basic idea underlying the SPV protocol is that each user only needs to keep the headers of blocks, rather than the blocks themselves. This means that the local storage overhead still increases linearly with the number of blocks, which grows over time and can quickly become prohibitive for lightweight users. An alternate approach is that a lightweight user chooses to trust some nodes in a blockchain system. However, this practice sacrifice the most appealing feature of the blockchains, namely the absence of any trusted third party. Moreover, this approach can be vulnerable to, for example, Sybil attacks .
In this paper, we propose an efficient verification protocol for public blockchain, dubbed EPBC. The core of EPBC is a succinct blockchain verification protocol that “compresses” the whole chain to a constant-size summary, using a cryptography accumulator . A lightweight user only needs to store the most recent summary, which is sufficient for the user to verify the validity of transactions. EPBC can be incorporated into existing blockchains as a middle layer service, or can be seamlessly incorporated into new blockchain systems.
In summary, our contributions in this work include:
We design a novel scheme for lightweight users to use public blockchains using cryptographic accumulator.
We analyze the security and asymptotic performance of the scheme, including its storage cost.
We report a prototype implementation of the core protocol of EPBC and measure its performance. Experimental results show that the scheme is practical for lightweight users.
The rest of the paper is organized as follows. In Section II we briefly review the background of public blockchains and the simplified payment verification protocol. In Section III we describe the design of the core component of EPBC, i.e., efficient block verification, and analyze its security. Section IV describes two common operations for blockchain based applications using the core component of EPBC, and we provide the architecture to integrate EPBC with existing blockchain systems in Section V. Experimental results are given in Section VI to demonstrate the practicability of EPBC, and Section VII discusses the related prior work. We conclude the paper in Section VIII.
Ii Background of Public Blockchain
A blockchain is a distributed ledger that has been used by Bitcoin and other applications to store their transaction data, where a transaction can be a payment operation, smart contract submission, or smart contract execution result submission. There are different approaches to construct blockchains. In this work, we focus on the class of blockchains that are built on the principle of proof-of-work (PoW) . This class of blockchains have a low throughput and a high latency, but have the desirable properties of fairness and expensive-to-attack. Furthermore, there are many efforts at improving their performance [7, 8] and characterizing their security properties .
Since a blockchain is immutable and append-only, its size keeps growing. There are proposals for coping with this issue. A straightforward approach is to trust some user, who can check the validity of transactions on the user’s behalf. This approach assumes that the lightweight user always knows who can be trusted. Another approach is to use the SPV protocol mentioned above . In this scheme, as highlighted in Fig. 1, a user only needs to store the block headers, which contain the root of the Merkle tree of the transactions in the corresponding block. When a user needs to verify a transaction, it sends a request to the system asking for the corresponding block, whose validity can be verified by using the root of the Merkle tree.
Iii Design and Analysis of EPBC
Iii-a Design Objective and Assumption
The objective of EPBC is to allow lightweight users to participate in applications that use public blockchains. By “lightweight users” we mean the users who use devices that have limited computation/storage capacities, such as IoT devices and smartphones. Specifically, EPBC aims to allow lightweight users to achieve the following:
Efficient storage: A user does not have to store or download the entire blockchain. Instead, a user only needs to consume a storage that is ideally independent of the size of the blockchain.
Verifiability of transactions: A user can verify whether a transaction has been accepted by the blockchain or not.
Like any public blockchain constructed according to proof-of-work, we assume that the majority of the users are honest.
In what follows, we first describe the block verification protocol, which is the core component of EPBC. Then, we describe how to use this protocol to construct the EPBC scheme.
Iii-B The Block Verification Protocol
Fig. 2 gives an overview of the verification protocol. Basically, a lightweight user can verify the validity of transactions by interacting with the blockchain system.
The blockchain verification protocol of EPBC consists of the following four algorithms:
Setup: This algorithm is executed once by the creator of the blockchain. The algorithm generates the public parameters that are needed by the other algorithms.
Block and summary construction: This algorithm generates blocks and a summary of the current blockchain. Anyone participating in the mining competition to build new blocks is responsible for calculating the summary of the current blockchain. The summary depends on the content of the current blockchain and the public parameters.
Proof generation: This algorithm generates a proof for a given block. The proof may depend on, among other things, the entire blockchain.
Proof verification: Given the summary of a blockchain and a proof for a single block, this algorithm verifies whether the proof is valid or not.
With this protocol, a lightweight user keeps the updated summary of the blockchain. When the user wants to verify a specific block, it can ask the parties that are involved in a transaction for a proof for the block, which is generated by running the proof generation algorithm. The user then executes the proof verification algorithm to determine whether to accept the block or not. In what follows we describe the details of these algorithms.
The creator of the blockchain selects two large prime numbers , and calculates as in the RSA accumulator system. is embedded into the first block and disclosed to the public; and then are discarded. The creator also selects a random value . Each block will be labelled with an integer, with the “genesis” block (i.e., the first block on the blockchain) has the label “1”.
Block and summary construction.
Each block contains, in addition to the standard attributes (e.g., transaction information and proof-of-work nonce), a new attribute , which is the summary of the current blockchain. For the -th block, which is denoted by , the attribute is calculated and stored with as follows:
If the current blockchain contains blocks, is the summary of the current blockchain. The block position information is used in the computation for the purpose of preventing the attacker from manipulating the position of a block. After the newly generated block is broadcast to the blockchain system, the following two algorithms can be executed.
To generate a proof that shows block is the -th block on the blockchain with summary , where , the prover calculates as follows:
Note that the proof is generated by a user who keeps the entire blockchain and therefore can compute without knowing , where is the Euler function.
Given a block blk, a claimed proof , and a blockchain summary , a user can verify that block is indeed the -th block on blockchain with summary , where , as follows:
If both equations hold, the user accepts that is a valid proof for blk; otherwise, the verifier rejects the block.
Iii-C Parameter Initialization
One of the key steps in the blockchain verification protocol is the parameter initialization, i.e., selecting and to generate the modulus . If or is exposed, the protocol is clearly not secure. This issue can be addressed by generating using a multi-party protocol. There have been many protocols for this purpose. For example, the protocol proposed by Cocks  works as follows. Suppose at the beginning there are users who work together to generate the first block.
Each user , , selects his/her own prime numbers .
Each user , , calculates . By leveraging the protocol given in , user can calculate without knowing the two factors of .
Each user tests whether is a product of two prime numbers or not. Specifically, the system randomly selects a random number and each user calculates . If , passes the test. Carmichael numbers that can pass this test can be further eliminated by methods given in .
If the current passes all tests, users work together to embedded it in the genesis block. Otherwise they repeat the process again, until an appropriate is found.
Since only needs to be generated once, the cost of the parameter initialization is not a big concern.
Iii-D Security and Performance of the Block Verification Protocol
It is straightforward to verify that the protocol is correct, meaning that any legitimate proof will be accepted as valid. The following theorem shows that for a given summary of blockchain , no attacker can generate a valid proof for a forged block that is not contained in under strong RSA assumption.
Given a summary of blockchain , there is no probabilistic polynomial-time attacker that can forge a block and an accompanying proof that is a valid block on blockchain in the random oracle model; otherwise, the Strong RSA assumption is broken.
Suppose behaves like a random oracle. Let where is the -th block on , and . We consider two scenarios of attacks:
The attacker knows the summary but not the blockchain. Suppose the attacker chooses and position for the block. Then, the attacker needs to compute such that
This immediately breaks the Strong RSA assumption.
The attacker knows both blockchain and the summary . In this case, the attacker knows all valid proofs for blocks in , i.e., . Suppose the attacker can generate a valid proof for a forged block for some position . Let . If , the attacker can successfully make a valid proof for at position because the attacker can compute . Because the attacker cannot control the output of hash()
, the probability that the attacker can succeed is equivalent to the probability that a random numberis a factor of another random number . According to Erdös-Kac theorem  and its extension counting multiplicities , the number of prime factors of counting multiplicity is . With Binomial theorem, the total number of divisors of is , and . Therefore, the probability that an attacker can find is negligible when is large enough. As long as the attacker cannot find such , a successful attack implies that the the Strong RSA assumption is broken.
In summary, there is no practical attack against the protocol in the random oracle model unless the Strong RSA assumption is broken.
Performance of the major algorithms is analyzed as follows.
Block construction. When compared with the straightforward method by which each user keeps the entire blockchain, our method incurs some extra work in the block construction algorithm. The extra work consists of two parts: evaluating the hash value of the new block and calculating the new summary. The computation overhead is constant (i.e., one hash calculation and one modular exponentiation) and the storage overhead is also constant (i.e., an element in for the summary). The summary also incurs extra communication cost, which is however small (e.g., 2048 bits for a 2048-bit ).
Proof generation. The proof generation algorithm does not incur extra storage. The computational cost is proportional to the length of the current blockchain (i.e., the number of blocks in the chain) and the position of the block. Suppose the length of the blockchain is , and the proof of -th block needs to be generated, where . The prover needs to conduct one hash evaluation of the th block, and calculates the product of hash values of blocks . In summary, the prover calculates hashes, multiplications, and one modular exponentiation. Since the nodes with sufficient storage capacity (rather than the lightweight users) are supposed to generate proofs, the protocol is practical.
Proof verification. The computational cost to verify the proof of a block includes one hash evaluation and one modular exponentiation, which is constant. This explains why the protocol is suitable for lightweight users who only keep the summary of the blockchain.
Iii-E Reducing Cost of Proof Generation
Although both the cost of updating the summary of a blockchain and the cost of verifying a block are constant, the computational complexity for the prover to generate a proof is , where is the number of current blocks on the blockchain (i.e., keeps increasing). In the worst-case scenario, the prover needs to traverse all of the blocks on the blockchain to calculate the second part of the proof, namely
In order to reduce the computational complexity incurred by this, we design a scheme that improves the computational efficiency at the price of a slight increase in storage.
Proof generation with a smaller computational complexty.
The basic idea underlying the scheme is to let the prover maintain a binary tree . As illustrated in Fig. 3, the binary tree is used to store intermediate results that can be used to generate a proof for a given block. Specifically, each leaf stores the hash value of a corresponding block, and each internal node stores the product of its two direct children nodes. This way, the root node stores the product of the hash values of all of the blocks on the blockchain. The height of is pre-determined. If a leaf is empty (i.e., currently there is no corresponding block on the blockchain), its value is set to 1 so that it does not contribute to the value stored at the root node.
Suppose the height of tree is and the number of currrent blocks on blockchain is , where . To calculate a proof for block , where , the prover leverages the information stored in as follows:
Find the product of all of the values on the right-hand of (the blockchain grows from left to right)
Instead of conducting the multiplication operation one-by-one, the prover utilizes different products information stored in to accelerate the computation.
Set the proof as .
Note that the height of determines the number of blocks it can accommodate, and is therefore a pre-determined public parameter. If the height of is , the total number of blocks it can accommodate is . This is no significant constraint because a relatively small can accommodate a large number of blocks. For example, when , the structure can accommodate 4,294,967,296 blocks, which are about 9,000 times larger than the number of blocks on the Bitcoin network as of April 2017.
Analysis of the improved scheme.
The improved scheme involves a binary tree to store some information that can be used for generating proofs. Let , meaning that is the number of leaves. Let . At the leaf level (i.e., the first level), the size of each node is . Each node at -th level incurs bits of storage, and the size of the root node is bits. Therefore, the size of is
With intermediate results stored in , the computation complexity for generating a proof is reduced to (or ) modular exponentiations.
More generally, if each internal node in Fig. 3 has children, the height of is reduced to . A similar analysis shows that the total size of is , which is the size of storage a prover keeps locally. In order to calculate , which is defined in Equation (1), it requires about multiplication operations in the worst-case scenario, where is the number of multiplications incurred at an internal node at the second level of . In order to select the value of so as to minimize the overall computational complexity, we calculate the derivative as follows:
which monotonically increases with respect to . Therefore, we get the minimum value when
and . In practice, we can set the number of branches to a small constant integer so as to reduce the computational complexity of the prover.
Iv Using the Block Verification Protocol to Construct EPBC
In this section, we discuss construction of high-layer operations based on the verification protocol described in Section III. Specifically, we focus on two basic protocols: blockchain identification and transaction verification.
When a lightweight user needs to join a blockchain based application, it needs to obtain the current summary of the blockchain. Protocol 1 is for this purpose.
Note that as long as the attacker does not control majority of the users, the protocol is secure. The lightweight user can also adopt other strategies to determine the summary, e.g., giving different weights to selected users and include this information when making the decision.
A transaction is valid if and only if the block it belongs to is accepted by the majority of users, i.e., on the longest branch of the blockchain. Therefore, verification of a transaction is reduced to checking the validity of a block and its position (i.e., block number). A lightweight user can use the block verification protocol to verify that the block in question indeed contains the transaction in question. Then, the lightweight user can check the number of blocks that have been added after the block that is verified. Similar to the Bitcoin system , if more than 6 blocks have been added to the blockchain after the block under consideration, the transaction in question can be accepted with high confidentiality.
If the transaction is a smart contract submission or one-time smart contract execution result submission, the above method is also sufficient. However, if the transaction is a payment operation or submission of multiple-time smart contract execution result, freshness becomes a concern. For example, the attacker can provide proof of an old block that contains previous payment of the same value. To prevent such attacks, the lightweight user can maintain a local counter and include the counter in its transactions.
V Integration with Existing Blockchain Systems
Because a lot of public blockchain applications have been developed, it is useful to enable EPBC for these systems without modifying existing data structures and client. To achieve this goal, EPBC can work as a separate service layer on top of existing blockchain systems. Fig. 4 demonstrates the relationship between the existing blockchain system and the newly added EPBC service.
Specifically, a separate EPBC client with embedded parameters can be distributed to users who maintain the blockchain and play the role of a prover. Here parameters are values that used for blockchain summary construction. Summaries of the blockchain are not involved in mining, and users can use existing client to produce new blocks and achieve consensus on the blockchain. After the user decides to accept a new block, the EPBC client produces a new summary based on previous summary value and the new block, and stores the new summary locally. Note that summaries are determined by the blockchain itself so EPBC client does not need to run any consensus mechanism. If the user wants to reduce the time complexity of generating a proof, EPBC client can maintain the tree structure described in Section III-E.
Vi Experiments and Evaluation
In this section, we describe the implementation and provide preliminary experimental results of EPBC. We focus on the block verification protocol because it is the core of EPBC.
Implementation and parameters.
We implemented a prototype of the block verification protocol based on the MIRACL crypto library . Since security of the protocol depends on the Strong RSA assumption, we chose a 1,024 bits in the implementation. SHA256 was used for . We also set the height of as 32. When a leaf is empty, its value is set to 1 and there is no need to store it.
We conducted the experiments on a desktop with a low-end Intel Celeron 1017U processor, which has a similar Geekbench 4 score of Snapdragon 805 processor . The experimental results are summarized in Fig. 5, which shows that although the cost of proof generation depends on the size of the blockchain, the cost of proof verification is independent of the blockchain size.
As discussed in Section IV, some high-level operations like balance checking require the lightweight client to verify more than one blocks. This is not a problem in practice for the user using lightweight client because it only takes about 0.02 second to verify one block.
Vii Related Works
EPBC only provides the mechanism for checking the validity of a given block and the transactions contained in the block. It does not consider how to determine which block(s) should be checked. It is proposed in BIP 37 to use a bloom filter to select potentially related blocks for verification . The Bitcoin community proposes the UTXO (unspent transaction outputs) technology, which requires the user to store unspent transaction output information instead of transaction information. This reduces the storage cost but does not change the order of storage complexity .
Cryptographic accumulator was first developed by Benaloh and De Mare to achieve decentralized digital signature . Barić and Pfitzmann developed a collision-free accumulator and used it for fail-stop signatures without using any tree structure . Cryptographic accumulators are useful (e.g., constructing group signatures ). Dynamic cryptographic accumulator can further support adding/removing members . These schemes do not consider features of blockchains, namely that every user has the privilege to construct blocks and generate proofs and lightweight users have very limited computational capability. Recently, e-cash systems such as ZeroCoin also utilizes cryptographic accumulators, but for a different purpose of information hiding .
Another line of related research is storage verification in the cloud environment, and several related concepts were proposed, e.g., provable data possession  and proof of retrievability . These schemes cannot be applied in our scenario because the lightweight users do not know the blockchain in advance and the blockchain keeps growing as new blocks are created and appended to it.
Both EPBC and SPV assume the records that are embedded into blocks are correct if the corresponding blocks are valid. Some techniques that are applicable to SPV, such as bloom filter , are also applicable to EPBC. Nevertheless, EPBC incurs only a constant amount of storage for the lightweight client, assuming the client cares about most recent transactions. This is significant because storing several block headers might be cheaper than storing the summary value.
We have presented EPBC, a scheme for lightweight users to use blockchain-based applications without storing the entire blockchain while still able to verify the validity of blocks and transaction. The basic idea is to “compress” a blockchain to a constant-size summary, which is the only data item a lightweight client needs to keep. We analyzed the security of EPBC and preliminary experiments showed that it is practical. EPBC can be adopted for blockchain-based applications, such as e-cash and smart contract systems.
-  Satoshi Nakamoto. Bitcoin: A peer-to-peer electronic cash system, 2008.
-  Gavin Wood. Ethereum: A secure decentralised generalised transaction ledger. Ethereum Project Yellow Paper, 151, 2014.
-  Konstantinos Christidis and Michael Devetsikiotis. Blockchains and smart contracts for the internet of things. IEEE Access, 2016.
-  John Douceur. The sybil attack. Peer-to-peer Systems, pages 251–260, 2002.
-  Josh Benaloh and Michael De Mare. One-way accumulators: A decentralized alternative to digital signatures. In Workshop on the Theory and Application of of Cryptographic Techniques, pages 274–285. Springer, 1993.
-  Marko Vukolić. The quest for scalable blockchain fabric: Proof-of-work vs. bft replication. In International Workshop on Open Problems in Network Security, pages 112–125. Springer, 2015.
-  Kyle Croman, Christian Decker, Ittay Eyal, Adem Efe Gencer, Ari Juels, Ahmed Kosba, Andrew Miller, Prateek Saxena, Elaine Shi, Emin Gün Sirer, et al. On scaling decentralized blockchains. In International Conference on Financial Cryptography and Data Security, pages 106–125. Springer, 2016.
-  Loi Luu, Viswesh Narayanan, Kunal Baweja, Chaodong Zheng, Seth Gilbert, and Prateek Saxena. Scp: a computationally-scalable byzantine consensus protocol for blockchains. Technical report, Cryptology ePrint Archive, Report 2015/1168, 2015.
-  Arthur Gervais, Ghassan O Karame, Karl Wüst, Vasileios Glykantzis, Hubert Ritzdorf, and Srdjan Capkun. On the security and performance of proof of work blockchains. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pages 3–16. ACM, 2016.
-  Clifford Cocks. Split knowledge generation of rsa parameters. In IMA International Conference on Cryptography and Coding, pages 89–95. Springer, 1997.
-  Dan Boneh and Matthew Franklin. Efficient generation of shared rsa keys. In Advances in Cryptology-CRYPTO’97: 17th Annual International Cryptology Conference, Santa Barbara, California, USA, August 1997. Proceedings, page 425. Springer, 1997.
-  Paul Erdös and Mark Kac. The gaussian law of errors in the theory of additive number theoretic functions. American Journal of Mathematics, 62(1):738–742, 1940.
On the central limit theorem for the prime divisor function.American Mathematical Monthly, pages 132–139, 1969.
-  Certivox Ltd. MIRACL cryptographic library.
-  Primate labs. Geekbenchmark 4.
-  Mike Hearn and Matt Corallo. BIP 37: Connection bloom filtering, 2012.
-  Bryan Bishop. Review of bitcoin scaling proposals. In Scaling Bitcoin Workshop Phase, volume 1, 2015.
-  Niko Barić and Birgit Pfitzmann. Collision-free accumulators and fail-stop signature schemes without trees. In International Conference on the Theory and Applications of Cryptographic Techniques, pages 480–494. Springer, 1997.
-  Gene Tsudik and Shouhuai Xu. Accumulating composites and improved group signing. In Asiacrypt, volume 2894, pages 269–286. Springer, 2003.
-  Michael T Goodrich, Roberto Tamassia, and Jasminka Hasić. An efficient dynamic and distributed cryptographic accumulator. In International Conference on Information Security, pages 372–388. Springer, 2002.
-  Ian Miers, Christina Garman, Matthew Green, and Aviel D Rubin. Zerocoin: Anonymous distributed e-cash from bitcoin. In Security and Privacy (SP), 2013 IEEE Symposium on, pages 397–411. IEEE, 2013.
-  Giuseppe Ateniese, Randal Burns, Reza Curtmola, Joseph Herring, Lea Kissner, Zachary Peterson, and Dawn Song. Provable data possession at untrusted stores. In Peng Ning, Sabrina De Capitani di Vimercati, and Paul F. Syverson, editors, Proceedings of the 2007 ACM Conference on Computer and Communications Security -CCS 2007, pages 598–609. ACM, 2007.
-  Qingji Zheng and Shouhuai Xu. Fair and dynamic proofs of retrievability. In Proceedings of the first ACM conference on Data and application security and privacy, pages 237–248. ACM, 2011.
-  Arthur Gervais, Srdjan Capkun, Ghassan O Karame, and Damian Gruber. On the privacy provisions of bloom filters in lightweight bitcoin clients. In Proceedings of the 30th Annual Computer Security Applications Conference, pages 326–335. ACM, 2014.