Blockchain Sharding is an approach that implements the idea of Sharding (Corbett et al., 2013) in blockchain to increase the transaction throughput without raising the bandwidth and processing requirements of nodes. By allowing multiple committees (Shards) running in parallel, the nodes inside every Shard solely process the data in their Shard, which leads to the system throughput increasing a lot.
Because the essence of a blockchain is in being decentralised and permissionless, and so it should allow as many devices as possible to participate in the system, the idea of Blockchain Sharding is a promising solution to solve the dilemma between increasing performance on one hand and increasing decentralisation on the other hand. Previous work on Sharding has explored various ideas (Elastico (Luu et al., 2016), RSCoin (Danezis and Meiklejohn, 2015), OmniLedger (Kokoris-Kogias et al., 2018), RapidChain (Zamani et al., 2018)) that can withstand up to 1/4 or 1/3 of network nodes being malicious. These approaches only support a small number of Shards in the system, or, equivalently, they require a large number of nodes in each Shard, both of which impact performance negatively.
In this paper, we propose a new Blockchain Sharding approach that can withstand up to of malicious nodes in the system. Compared to other methods, the probability that the malicious nodes will control a Shard is lower, and only a small number of nodes are required for every Shard to function securely. So that the communication costs inside every Shard are smaller and more Shards can exist in parallel, and that improves the transaction per second globally.
2. Blockchain Sharding Hypothesis
If we are inside a forest recording the time when trees fall, it is not necessarily for everyone to hear every fall of the tree to maintain the fairness of the system. The fact that a tree falls and the time when a tree falls is correct when it is recognised by most people around the tree assumed these persons have not colluded. With a sufficient number of people, if they are assigned randomly and completely distributed to subareas in the forest and are reassigned time by time to avoid the accumulation of adversary power, collusion is hard to happen (expected to occur in years). As long as the random and distributed assignment is secured, follow the principle of proportionality, taking control of a subarea requires a significant effort similar to taken the whole system when there is only one area.
In particular, this proposal is secure when (1) only people assigned to a subarea of the forest are legal to record the information about this subarea. (2) any person cannot control or predict which subarea it is about to be assigned in. (3) the assignment follows a globally recognised rule, not by the arbitrary willing of some specific group of superior people. (4) people are periodically reassigned. (5) the number of people inside every Shard is large enough.
If the above criteria are fulfilled, and with a sufficient number of honest people, one would only need to check what is the common recognised time of falling for a tree of their interest from the subarea where this tree belongs to, it is not necessary for themselves to hear the falling. In this way, people do not need to have super hearing power when the forest is dense. Instead, they only need to focus on monitoring the subarea where they are assigned to.
2.1. Failure Probability
The probability of obtaining no less than adversary nodes when randomly picking a Shard sized ( is the number of nodes inside the Shard) can be calculated by the cumulative hypergeometric distribution function without replacement from a population of nodes. Let
denote the random variable corresponding to the number of adversary nodes in the sampled group. The failure probability for one committee is at most
which calculates the probability that no less than nodes are adversary in a group of nodes and is the number of nodes controlled by the adversary globally.
The hypergeometric distribution depends directly on the total population size (i.e., ). Due to
can change time by time in a permissionless network (open-membership), the failure probability might be affected consequently. To maintain the desired failure probability, each Shard in RapidChain runs a consensus in pre-determined intervals (e.g. once a week), to agree on a new committee size, based on which, the committee will accept more nodes to join the committee in future epochs.
Figure 2 shows the maximum probability to fail with and where is the number of Shards.
As can be seen from the result, the system has a very high failure chance when the adversary taken of nodes, even if there are only 7 Shards. That is the main reason why all the Blockchain Sharding approaches so far are only withstanding up to of nodes being bad.
If every iteration lasts for minutes, then the time to secure a fail with the failure chance with of nodes being bad is over years. There cannot be more than Shards when and a failure chance is maintained. The block interval (length of every synchronisation iteration) cannot be shortened; otherwise, it also reduces the time to fail. In Nakamoto blockchain, a block is published in every minute with over transactions inside the block. Thus, the transactions embedded to the Nakamoto blockchain per hour is over . If we set the same block size for the Blockchain Sharding approaches, with minutes block interval, it can only process over transactions per hour with Shards. However, considering all the additional designs, the lowered Byzantine fault tolerate rate, the slowed block interval and the failure in the next years, people may wonder whether a tripled performance worth all the costs.
3. The n/2 Byzantine node tolerated Blockchain Sharding approach
In this section, we our approach in more detail.
3.1. Our Hypothesis
Instead of recording the time of tree fallings inside a forest, imagine nodes are juries inside the courtrooms. We rule that a sentence is made when more than a predefined number of people inside the jury sized reached a consensus. The people inside the jury are selected from different occupations. The court office will randomly reorganise the membership of every jury and form a new jury every time there comes number of new people in these number of occupations. For example, let’s assume , a jury should have five people: a teacher, a social worker, a doctor, a police officer, and a businessperson. Then, there are at least ten teachers, ten social workers, ten doctors, ten police officers, and ten businesspeople for ten juries to run in parallel.
When adding new people to the court system, the court office should give preference to the people of seniority (the people who reported to the office earlier but has not yet been added to the system). Assumed and six new people in five different occupations are waiting to be added to the system, person and person are in the same occupation. The court office would add person to the system with the other new people in other occupations if person reported to the office earlier than . Person will be put in the pending status until there come four new people in the other four occupations. The same as human society, it takes effort () and time to become a professional, changing an occupation can waste these spendings (loss the position in the pending queue).
The court office periodically publishes the number of people in every occupation who reported to the office and is waiting to be assigned to a jury. When a young man (new node) decides his occupation, he will check the pending queue of every occupation and choose an unpopular occupation to get into the system quicker. Because every young man (new node) line in the shorter queue, the number of people in every occupation is automatically close to each other (tend to be equal in the long run). The person, regardless if it is inside a jury or in the waiting queue, should work (generate s) in every fixed time window. Thus the same as the forest hypothesis, the adversary who has half of the overall energy can only have half of the people in the system. With the precondition that it is very costly to change the occupation, and do so will lose some advantage, our hypothesis is distinguished from the forest hypothesis. If the adversary controls two social workers, they cannot live inside the same jury. However, they can be inside the same sub-area of the forest when recording the tree falling time.
shows the pending queue published by the court office at the moment one. Table2 shows the pending queues when the adding conditions are meet after moment one before moment two. Table 3 shows the pending queue at the moment two. Table 4 shows the people in the court system at the moment one. Table 5 shows the people in the court system at the moment two where some new people are added.
|Number of Pending person||4||2||1||1||1|
|Add to court system ->||B||M||N||O||P|
|Number of Pending person||4||4||5||4||5|
|Number of Pending person||0||0||1||0||1|
As can be seen from the adding procedure, if the Adversary is not in a very long queue, there is no gain for the Adversary to change the occupation once it reports to the court office. If it does so, it goes to the tail of another queue, leaving its original place to others.
3.2. Failure Probability
Table 6 shows a court schedule table for ten courts run in parallel with the jury sized five, and ten people in each of every occupation. In Table 6, refers to the adversary person, refers to the honest person.
For a number of the jury meeting to be held in parallel, there is a number of people in each occupation. Let the adversary has number of the person in Occupation ; then the chance for the adversary to secure a manipulated sentence is
where is the number of the person the adversary must take in a jury to manipulate the sentence.
To derive the maximised , we want to be maximised because is the same. Let the adversary has number of people inside the system (Court Jury Schedule), then . To let the value of maximise, we consider
This scenario is the maximised because given any positive integer ,
If (all the people in the jury should reach the same verdict when making a sentence), then
Let (half of the overall population) then
Though the adversary cannot manipulate a sentence when it does not have people inside a Shard, it can halt a sentence to be reached when it has number of the nodes in a Shard. Then this sentence cannot be made until the next court (the group of juries are re-selected). Thus, to make the system function more smoothly, we want while meeting the security threshold (e.g. failure chance). Figure 3 shows the maximum failure chance with different , , and ( fraction of the overall population).
As can be seen from the result, when there are ten Shards and n/2 people being evil, the failure chance is below , which significantly outperformed the RapidChain at below when it has ten Shards and only nodes being evil. If we set the block interval to be , then it takes over years to fail the system. If it is (the same as Nakamoto blockchain), it still takes over years to fail. If we maintain the failure chances at this circumstance with , then there can be Shards at the same time.
4. Potential usage
It has been a long-standing question for how to open the membership in a distributed system while maintaining the performance of distributed jobs as well as the integrity and correctness of the job results (Ishibuchi et al., 1992; Friedman, 2003). How to enable nodes in the different background to participate and tolerant them go offline without notice while stabilised the system as a whole (Pass and Shi, 2017). By increasing the Byzantine-fault-tolerant rate as well as the performance of the Blockchain Sharding approach, the improved Blockchain Sharding approach may solve these standing problems. For example, a data grid and distributed database can allow their users to be a part of the processing system. IoT devices can be governed decentralised by the transparent rule of laws (Psaras, 2018; Bogner et al., 2016; Mocnej et al., 2018), in this way, waived the concern over privacy or even espionage for smart home assistants.
In this paper, we discussed a new Blockchain Sharding approach that maintains the system integrity when there are at most fraction of adversary nodes. Compared to the previous work, the required number of nodes per Shard is much lower and more Shards are allowed to exist with the same security threshold.
-  (2016) A decentralised sharing app running a smart contract on the ethereum blockchain. In Proceedings of the 6th International Conference on the Internet of Things, pp. 177–178. Cited by: §4.
-  (2013) Spanner: google’s globally distributed database. ACM Transactions on Computer Systems (TOCS) 31 (3), pp. 8. Cited by: §1.
-  (2015) Centrally banked cryptocurrencies. arXiv preprint arXiv:1505.06895. Cited by: §1.
-  (2003) Fuzzy group membership. In Future Directions in Distributed Computing, pp. 114–118. Cited by: §4.
-  (1992) Distributed representation of fuzzy rules and its application to pattern classification. Fuzzy sets and systems 52 (1), pp. 21–32. Cited by: §4.
-  (2018) Omniledger: a secure, scale-out, decentralized ledger via sharding. In 2018 IEEE Symposium on Security and Privacy (SP), pp. 583–598. Cited by: §1.
-  (2016) A secure sharding protocol for open blockchains. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 17–30. Cited by: §1.
-  (2018) Decentralised iot architecture for efficient resources utilisation. IFAC-PapersOnLine 51 (6), pp. 168–173. Cited by: §4.
-  (2017) Hybrid consensus: efficient consensus in the permissionless model. In 31st International Symposium on Distributed Computing (DISC 2017), Cited by: §4.
-  (2018) Decentralised edge-computing and iot through distributed trust. In Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services, pp. 505–507. Cited by: §4.
-  (2018) Rapidchain: scaling blockchain via full sharding. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pp. 931–948. Cited by: §1.