Traditional currencies have a centralized structure with a bank as a central authority, and thus there exist several problems such as a single point of failure and corruption. For example, the global financial crisis in 2008 was aggravated by the flawed policies of banks that eventually led to many bank failures, followed by an increase in the distrust of these institutions. With this background, Bitcoin , which is the first decentralized digital currency, has received considerable attention. Given that it is a decentralized cryptocurrency, unlike traditional financial systems, there is no organization that controls the system.
To operate the system without any central authority, Bitcoin uses the blockchain technology. Blockchain as a public ledger stores transaction history, and nodes record the history on the blockchain by generating blocks through a consensus protocol, which provides a synchronized view among nodes. Bitcoin adopts a consensus protocol using the PoW mechanism in which nodes spend computational power to participate. Moreover, nodes receive coins as rewards in return for spending computational power, and the reward increases with the amount of spent computational power. This incentive system has attracted many participants. However, at the same time, computational power, which represents influence in PoW systems, has been significantly biased toward a few participants (i.e., mining pools) who possess advanced technology and great wealth. As a result, the Bitcoin system has achieved poor decentralization, deviating from its original aim [2, 3, 4].
Since the success of Bitcoin, many (currently over 1,500) cryptocurrencies have been developed. These cryptocurrencies have attempted to address several drawbacks of Bitcoin, such as low transaction throughput, a significant waste of energy due to the utilization of vast computational power, and poor decentralization. Therefore, some cryptocurrencies use consensus mechanisms different from PoW, such as PoS and DPoS, in which nodes should have a stake instead of a computing resource to participate in the system. While these new consensus mechanisms have addressed several of the drawbacks of Bitcoin, the problem of poor decentralization still remains unsolved. For example, similar to PoW systems, stakes, which represent influence in PoS and DPoS systems, are also significantly biased toward a few participants. This has caused concern for poor decentralization in PoS and DPoS coins, along with a heated debate between PoS and DPoS in terms of decentralization.
Currently, many coins suffer from two problems that degrade the level of decentralization: 1) insufficient number of independent participants because of coalition of participants (e.g., mining pools in PoW systems) and 2) a significantly biased power distribution among them. Therefore, many developers have attempted to create a good decentralized system [5, 6]. In addition, researchers such as Micali has noted that “incentives are the hardest thing to do” and believe that inappropriate incentive systems may cause blockchain systems to be significantly centralized . This fact implies that it is currently an open problem as to whether we can design an incentive system that allows a system to achieve good or full decentralization.
Full decentralization. In this paper, for the first time, we study when full decentralization can be reached. To this end, we first define -decentralization as a state satisfying 1) the number of participants running nodes in a consensus protocol is not less than and 2) the ratio between effective power of the richest and -th percentile participants is not greater than , where the effective power of a participant represents the total resource power of nodes run by that participant. The case when is sufficiently large and and are 0 represents full decentralization in which everyone has the same power. To study if a high level of decentralization is possible, we model a blockchain system (Section III) and then find four sufficient conditions of the incentive system for the blockchain system to converge in probability to -decentralization. If there is an incentive system satisfying these four conditions, the blockchain system can reach -decentralization with probability 1, regardless of the underlying consensus protocol. The four conditions are: 1) nodes with any resource power earn rewards, 2) it is not more profitable for participants to delegate their resource power to fewer participants than to run their own nodes, 3) it is not more profitable for a participant to run multiple nodes than to run one node, and 4) the ratio between the resource power of the richest and -th percentile nodes converges in probability to a value of less than
Impossibility. Based on these conditions, we find an incentive system that allows a system to reach full decentralization. In this incentive system, in order for the third condition to be met, the cost for one participant running multiple nodes should be greater than the total cost for multiple participants each running one node. The difference between the former cost and the latter cost is called a Sybil cost in this paper. This implies that a system where Sybil costs exist can be fully decentralized with probability 1.
When a system does not have Sybil costs, there is no incentive system that satisfies the four conditions (Section V). More specifically, the probability of reaching -decentralization is upper bounded by a function that is close to 0 for a small ratio between the resource power of the -th percentile and the richest participants in the system. This fact implies that the achievement of good decentralization in the system without Sybil costs totally depends on the rich-poor gap in the real world. As such, the larger the rich-poor gap, the closer the probability is to zero. Currently, we recognize that the distribution of wealth in the real world is severely biased, and this wealth inequality is a significant well known problem among economists [8, 9, 10]. To determine the approximate ratio in actual systems, we investigate hash rates in Bitcoin and observe that and are less than and , respectively. In this case, indicates the ratio between the resource power of the poorest and richest participants. This result supports the fact that, currently, it is almost impossible for blockchain systems without Sybil costs to achieve good decentralization.
Unfortunately, it is not yet known how permissionless blockchains that have no real identity management can have Sybil costs. Indeed, to the best of our knowledge, all permissionless blockchains do not currently have any Sybil costs. Therefore, considering this fact, we note that currently, it is almost impossible for permissionless blockchains to reach good decentralization. The existence of mechanisms to enforce a Sybil cost in permissionless blockchains is left as an open problem. The solution to this issue would be the key to determining how permissionless blockchains can reach a high level of decentralization.
Protocol analysis in top 100 coins. Next, to find out if what condition each system does not satisfy, we extensively analyze incentive systems of all existing PoW, PoS, and DPoS coins among the top 100 coins in CoinMarketCap  according to the four conditions (Section VI). According to this analysis, PoW and PoS systems cannot have both enough participants running nodes and an even power distribution among the participants. However, unlike PoW and PoS coins, DPoS coins can guarantee an even power distribution among a fixed number of participants when Sybil costs exist. Otherwise, if the Sybil costs do not exist, rational participants would run multiple nodes for higher profits. In this case, DPoS systems cannot guarantee that any participant possesses the same power.
Data analysis in top 100 coins. Furthermore, to validate the result of the protocol analysis and our theory, we conduct data analysis for PoW, PoS, and DPoS coins in the top 100 using three metrics: the number of block generators, Gini coefficient, and Shannon entropy (Section VII). Based on this empirical study, we can observe the expected rational behaviors in most existing coins. In addition, we quantitatively confirm that the coins do not currently achieve good decentralization. Moreover, interestingly, some DPoS coins are controlled by only two participants who create multiple nodes. As a result, this data analysis not only investigates the actual level of decentralization, but also empirically confirms the analysis results of incentive systems. Finally, we discuss a debate on incentive systems and whether we can relax the conditions for full decentralization (Section VIII).
In summary, our contributions are as follows.
We formally define -decentralization and find four sufficient conditions of an incentive system.
We prove that it is almost impossible for a system without a Sybil cost to have a high level of decentralization.
We analyze incentive systems of existing PoW, PoS, and DPoS coins in the top 100 according to the four sufficient conditions. This result describes what condition each system does not satisfy.
Data analysis for these coins validates our theory as well as showing quantitatively that current systems have poor decentralization.
Blockchain – a distributed ledger shared among disparate users – makes digital transactions possible without a central authority. This great promise has fueled a number of innovations such as cryptocurrencies , allowing users to exchange funds by issuing transactions, and smart contracts , facilitating the execution of an arbitrary code on top of the blockchain. The issued transactions and smart contracts are validated and recorded on the blockchain by nodes called block generators, which produce a block in a different way depending on the respective consensus protocols.
Blockchains can be roughly classified as: permissionless and permissioned. Permissioned blockchains typically rely on some central authorities foridentity management, whereas permissionless blockchains require some Sybil-resistance mechanisms to make attacks expensive. For example, most of the popular blockchains rely on PoW in which the number of Sybils that an adversary can spawn is limited based on her computing resources. Therefore, for the adversary to execute attacks in the PoW system, she should spend vast computing resources. However, this PoW mechanism results in low transaction throughput and a significant waste of energy . Besides, from the cases of Bitcoin and Ethereum, it is observed that it is difficult for PoW systems to establish good decentralization because the computing resources are largely concentrated in few participants who have enormous capital and advanced technology.
To solve these problems, another mechanism called PoS has been proposed, which limits the adversary’s power based on her stakes rather than her computing resource. Therefore, the adversary would have to spend a large stake to be successful in attacks. This mechanism addresses the issues of low transaction throughput and the wastage of energy. However, PoS still undermines decentralization because power would be concentrated in a few participants with considerable wealth, and this concern resulted in the advent of the DPoS mechanism, which deviates from existing permissionless blockchains. This mechanism forgoes the goal of full decentralization and instead is designed for nodes with large wealth to have the same power. This system allows users to delegate their stakes to a small set of nodes called delegates, which further determine the order of the transactions and generate blocks. Unlike PoW and PoS where anyone can generate blocks, DPoS gives the opportunity to only the delegates.
Decentralization is an essential factor that should be inherently considered in the design of blockchain systems. Even though people design systems for good decentralization, in practice, we often observe that blockchain systems are highly centralized. Bitcoin and Ethereum, as representative examples, are already well known to be highly centralized in terms of network and mining [14, 4, 15, 16]. Currently, most of the computational power (or mining power) in these systems is concentrated in only a few nodes, called mining pool,111More specifically, it refers to a centralized mining pool. Even though there is a decentralized mining pool, given that centralized pools are major pools, hereafter, we will simply call them mining pools. where individual miners gather together for mining. This causes concern for not only the level of decentralization, but also the security of systems, because the mining power distribution is critical in terms of security in PoW systems. Note that an individual or organization with over 50% of the entire computational power can attempt double-spending attacks in PoW systems. Moreover, there is selfish mining  in which an attacker with over 33% power can earn unfairly higher profits at the expense of others.
In general, when a participant has large resource power, his behavior would significantly influence others in the consensus protocol. In other words, the more resources a participant has, the greater his influence on the system. Therefore, the resource power distribution implicitly represents the level of decentralization in the system.
At this point, we can consider the following questions: “What can influential participants do in practice?” and “Can these behaviors harm other nodes?” Firstly, as described above, there are attacks such as double spending and selfish mining, which can be executed by an attacker with over 33% or 50% resource power. These attacks would result in significant financial damage . In addition, in a consensus protocol combined with Practical Byzantine Fault Tolerance (PBFT) , a malicious behavior of nodes that possess over 33% resource power can cause the consensus protocol to get into a stuck state. It would certainly be more difficult for such attacks to be executed by colluding with others when the resource power is more evenly distributed. In addition, nodes participating in the consensus protocol verify transactions and generate blocks. More specifically, in the process of generating a block, nodes choose which transactions will be included in the block. Therefore, they can only choose advantageous transactions while ignoring disadvantageous transactions. For example, participants can exclude transactions issued by rivals in the process of generating blocks, and if they possess large power, validation of these transactions would often be delayed because the malicious participant has many opportunities to choose transactions that will be validated. Even though the rivals can also retaliate against them, damage from the retaliation depends on the power gap between the malicious participants and their rivals.
Furthermore, transaction issuers should pay transaction fees including gas in blockchain systems, where gas refers to the cost associated with issuing smart contracts. The fees are usually determined by economic interactions . This implies that the fees can depend on the behavior of block generators. For example, if block generators verify only transactions that have fees above a specific amount, the overall transaction fees can increase because users would have to pay a high fee for their transactions to be validated. In other words, some block generators can attempt to increase the transaction fees for higher profits, and when they possess larger resource power, the fees may increase to a larger value. Indeed, we have already experienced and observed a similar situation in the real world when considering oligopoly. Note that companies in oligopolistic industries can control the product price, and they often increase the price.
Meanwhile, in fully decentralized systems, it is significantly difficult for the aforementioned problems to occur. Moreover, the system would certainly be fair to anyone. This spurs the desire to achieve a fully decentralized system. Even though many discussions and attempts have been made to achieve good decentralization, existing systems, except for Bitcoin, Ethereum and Stellar, have rarely been analyzed [4, 21]. In this paper, for the first time, we not only study the possibility of full decentralization but also extensively investigate the existing coins.
Iii System Model
In this section, we model a consensus protocol and an incentive system. Moreover, we introduce the notation used throughout this paper (see Table I).
Consensus protocol. A blockchain system has a consensus protocol where player participates and generates blocks by running their own nodes. The set of all nodes in the consensus protocol is denoted by and that of nodes run by player is denoted by . Moreover, we define as the set of all players running nodes in the consensus protocol (i.e., ). Therefore, is not less than In particular, if a player has multiple nodes, would be greater than
For nodes to join in the consensus protocol, they should possess specific resources, and their influences significantly depend on their resource power. The resource can be computational power and stake in consensus protocols with PoW and PoS mechanisms, respectively. Node possesses resource power
. Moreover, we define the vector of resource power for all nodes as follows:We also denote the resource power owned by player as and the set of players with positive resource power as (i.e., ). Here, note that these two sets and can be different because when a player delegates its own power to others, it does not run nodes but possesses the resource power (i.e., the fact that does not imply that ). For clarity, we describe a mining pool as an example. In the pool, there are an operator and workers, where the workers own their resource power but delegate it to the operator without running a full node. Therefore, pool workers belong to but not while the operator belongs to both and .
In fact, the influence of player on the consensus protocol depends on the total resource power of the nodes run by the player rather than just its resource power Therefore, we define , effective power of player as Again, considering the preceding example of mining pools, the operator’s effective power is the sum of the resource power of all pool workers while the workers have zero effective power. The maximum and -th percentile of are denoted by and , respectively, and represents a vector of resource power of the nodes owned by player (i.e., ). Note that and are the same. In addition, we consider the average time to generate one block as a time unit in the system. We use the superscript to express time For example, and represent the resource power of node at time and the vector of resource power possessed by nodes at time , respectively.
Incentive system. To incentivize players to participate in the consensus protocol, the blockchain system needs to have an incentive system. The incentive system would assign rewards to nodes, depending on their resource power. Here, we define the utility function of the node as the expected net profit per time unit, where represents the vector of other nodes’ resource power and the net profit indicates earned revenues, which subtracts all costs. Specifically, the utility function of node can be expressed as
wherefor a given This equation for and indicates that is the arithmetic mean of the random variable for given In addition, while function indicates the expected net profit that node can earn for the time unit, random variable represents all possible values of the net profit that node can obtain for the time unit. For clarity, we give an example of the Bitcoin system, whereby and are defined as:
where represents all costs associated with running node during the time unit. This is because a node currently earns 12.5 BTC as the block reward, and the probability of generating a block is proportional to its computing resource. Moreover, cannot be greater than a constant determined in the system. In other words, the system can provide nodes with a limited value of rewards at a given time. Indeed, the reward that a node can receive for a time unit cannot be infinity, and problems such as inflation would occur if the reward is significantly large.
In addition, if nodes can receive more rewards when they have larger resource power, then players would increase their resources by spending a part of the earned profit. In that case, for simplicity, we assume that all players increase their resource power per earned net profit at rate every time. For example, if a node earns a net profit at time the node’s resource power would increase by after time
We also define the Sybil cost function as an additional cost that a player should pay per time unit to run multiple nodes compared to the total cost of when those nodes are run by different players. The cost would be 0 if is 1 (i.e., the player runs one node). Moreover, the case where for any set such that indicates that the cost for one player to run nodes is always greater than the total cost for players each running one node. Note that this definition does not just imply that it is expensive to run many nodes, which is usually mentioned as Sybil costs in the consensus protocol . This function implies that the total cost for running multiple nodes depends on whether one player runs those nodes.
Finally, we assume that all players are rational. Thus, they act in the system for higher utility. More specifically, if there is a coalition of players in which the members can earn a higher profit, they delegate their power to form such a coalition (formally, it is referred to as a cooperative game). In addition, if it is more profitable for a player to run multiple nodes as opposed to one node, the player would run multiple nodes.
|Player of index|
|The set of players running nodes in the consensus protocol|
|Node of index|
|The set of nodes in the consensus protocol|
|The set of nodes owned by|
|,||The resource power of node and player|
|The vector of resource power for all nodes|
|The set of players with positive resource power|
|The effective power of nodes run by|
|The maximum and -th percentile of effective power of players running nodes|
|The vector of resource power of nodes run by|
|The resource power of at time|
|The vector of resource power at time|
|The vector of resource power of nodes other than|
|Utility function of|
|Random variable for a net reward of per time unit|
|The maximum value of random variable|
|Increasing rate of resource power per the net profit|
|Sybil cost function of|
Iv Conditions for full decentralization
In this section, we study when a high level of decentralization can be achieved. To this end, we first formally define -decentralization and introduce the sufficient conditions of an incentive system in blockchain systems to reach -decentralization. Then, based on these conditions, we find an incentive system that allows the system to be fully decentralized.
Iv-a Full Decentralization
The level of decentralization largely depends on two elements: the number of players running nodes in a consensus protocol and the distribution of effective power among the players. In this paper, full decentralization refers to the case where a system satisfies 1) the number of players running nodes is as large as possible and 2) distribution of effective power among the players is even. Therefore, if a system does not satisfy one of these requirements, it cannot have full decentralization. For example, if only two players run nodes with the same resource power, this case only satisfies the second requirement. As another example, a system may have many nodes run by independent players while the resource power is biased toward a few nodes. Then, this case only satisfies the first requirement. Clearly, both of these cases have poor decentralization. Note that, as described in Section II, blockchain systems based on a peer-to-peer network can be manipulated by partial players who possess in excess of 50% or 33% of the effective power. Next, to reflect the level of decentralization, we formally define -decentralization as follows.
Definition IV.1 (-Decentralization).
For and , a system is -decentralized if it satisfies the followings:
The size of is not less than (i.e., ).
The ratio between the effective power of the richest player, , and the -th percentile player, , is less than or equal to (i.e., ).
In Def. IV.1, the first requirement indicates that not only there are players that possess resources, but also that at least players should run their own nodes. In other words, too many players with resources do not combine into one node (i.e., many players do not delegate their resources to others.). Note that delegation decreases the number of players running nodes in the consensus protocol. The second requirement presents an even distribution of effective power among players running nodes. Specifically, for the richest and -th percentile players running nodes, the gap between their effective power is small. According to Def. IV.1, it is evident that the greater and the smaller and , the higher is the level of decentralization. Therefore, -decentralization for a sufficiently large indicates full decentralization in which there are sufficiently many independent players and everyone has the same power.
Iv-B Sufficient Conditions for Fully Decentralized Systems
Next, we introduce four sufficient conditions of an incentive system to reach -decentralization with probability 1. We first revisit two requirements of -decentralization. For the first requirement in Def. IV.1, the size of should be greater than or equal to because the size of is always not greater than that of This can be achieved by assigning at least nodes some rewards, which is represented in Condition 1 (GR-). In addition, it should not be more profitable for too many players to combine into a few nodes than when they directly run nodes. If such delegating behavior is more profitable than the one that is not, many players with resource power would delegate their power to a few players, resulting in Condition 2 (ND-) indicates that it should not be more profitable for nodes run by independent (or different) players to combine into fewer nodes when the number of all players running nodes is not greater than .
Condition 1 (Give Rewards (GR-)).
At least nodes should earn net profit. Formally, for any where
This condition states that some players can earn the reward by running a node, and this makes the number of existing nodes equal to or greater than Meanwhile, if the system does not give net profit, rational players would not run a node because the system requires a player to possess a specific resource (i.e., ) to run a node unlike other peer-to-peer systems such as Tor. Specifically, players should invest their resource power elsewhere for higher profits instead of participating in the consensus protocol where they cannot earn net profit, which is called an opportunity cost . As a result, to reach -decentralization, it is also necessary for a system to give net profit to some nodes.
Condition 2 (Non-delegation (ND-)).
Nodes run by different players do not combine into fewer nodes unless the number of all players running nodes is greater than Before defining it formally, we denote by a set of nodes run by different players. In other words, for any , the two players running and are different. We also let denote a proper subset of such that , where
Then, for any node set
The set presents all players running nodes, which do not belong to . In Eq. (1), the left-hand side represents the total utility of nodes in that are individually run by different players. Here, note that given that includes the resource power of the nodes in except for node The right-hand side represents the maximum total utility of nodes in when the nodes in are combined into fewer nodes belonging to by delegation of resource power of players. Note that because Therefore, Eq. (1) indicates that the utility in the case where multiple players delegate their power to fewer players is not greater than that for the case where the players directly run nodes. As a result, ND- prevents delegation that makes the number of players running nodes less than and the first requirement of -decentralization can be met when GR- and ND- holds.
Next, we consider the second requirement in Def. IV.1. One way to achieve an even distribution of effective power among some players is to cause the system to have an even resource power distribution among nodes while each player has only one node. Note that, in this case where each player has only one node, an even distribution of their effective power is equivalent to an even resource power distribution among nodes. Condition 3 (NS-) indicates that, for any player with above the -th percentile effective power, running multiple nodes is not more profitable than running one node. In addition, to reach a state where the richest and -th percentile nodes possess similar resource power, the ratio between the resource power of these two nodes should converge in probability to a value of less than This is presented in Condition 4 (ED-).
Condition 3 (No Sybil nodes (NS-)).
For any player with effective power not less than , participation with multiple nodes is not more profitable than participation with one node. Formally, for any player with effective power
where node the set , , and
In Eq. (2), the left and right-hand sides represent the maximum utility of the case where a player runs multiple nodes of which the total resource power is and a utility of the case that he runs only one node with resource power respectively. Therefore, Eq. (2) indicates that a player with equal to or greater than the -th percentile effective power can earn the maximum utility when he runs one node.
Condition 4 (Even Distribution (ED-)).
The ratio between resource power of the richest and -th percentile nodes should converge in probability to a value less than Formally, when and represent the maximum and -th percentile of , respectively,
The above condition indicates that the ratio between the resource power of the richest and -th percentile nodes reaches a value of less than with probability 1 when enough time is given. Note that changes over time, depending on the behavior of each player. In particular, if it is profitable for a player to increase its effective power, would be a random variable related to because a player reinvests part of its net profit to increase his resource. More specifically, in that case, increases to after time as described in Section III.
As a result, these four conditions can allow blockchain systems to reach -decentralization with probability 1, which is presented in the following theorem.
For any initial state, a system satisfying GR-, ND-, NS-, and ED- converges in probability to -decentralization.
Iv-C Possibility of Full Decentralization in Blockchain
To determine whether blockchain systems can reach full decentralization, we study the existence of an incentive system that satisfies the four conditions for a sufficiently large , , and In this section, we provide an example of incentive systems that satisfies the four conditions to achieve full decentralization.
It is also important to increase the total resource power involved in the consensus protocol in terms of security. This is because if the total resource power involved in the consensus protocol is small, an attacker can easily subvert the system. Therefore, to prevent this, we construct as an increasing function of , which implies that players continually increase their resource power. In addition, we construct random variable and its probability as follows:
where the superscript representing time is omitted for convenience. This incentive system indicates that when a node generates a block, it earns the block reward and the probability to generate a block is proportional to the square root of the node’s resource power. Under this setting, we can easily check that the utility function is a mean of
Next, we show that this incentive system satisfies the four conditions. First, the utility satisfies GR- for any because it is always positive. ND- is also satisfied because the below equation is satisfied.
Thirdly, to make NS-0 true, we can choose a proper Sybil cost function of Eq. (2), which satisfies the following:
Under this Sybil cost function, the rational players would run only one node. Finally, to show that this incentive system satisfies ED-, we use the following theorem, whose proof is presented in Appendix A.
Assume that is defined as follows:
where Then if is a strictly increasing function of and the below equation is satisfied for all ED- is satisfied for all and .
On the contrary, if is a strictly increasing function of and Eq. (6) is not satisfied for all ED- cannot be met for all and .
The above theorem states that when the utility is a strictly increasing function of and Eq. (6) is satisfied under the assumption that the block reward is constant for a given an even power distribution is achieved. Meanwhile, if Eq. (6) is not met, the gap between rich and poor nodes cannot be narrowed. Specifically, in the case where is constant, the large gap can be continued.222Formally speaking, the probability to reach an even distribution of resource power among nodes is less than 1, and in Thm. V.3, we will deal with how small the probability is. Moreover, the gap would widen when is a strictly increasing function of . In fact, here we can consider as an increasing rate of resource power of node Therefore, Eq. (6) indicates that the resource power of a poor node increases faster than that of a rich node.
Now, we describe why the incentive system defined by Eq. (3), (4), and (5) satisfies ED-. First, Eq. (3) is a form of described in Thm. IV.2, and Eq. (5) implies that is a strictly increasing function of . Therefore, ED- is met by Thm. IV.2 because Eq. (5) satisfies Eq. (6) for all As a result, the incentive system defined by Eq. (3), (4), and (5) satisfies the four sufficient conditions, implying that full decentralization is possible under a proper Sybil cost function Moreover, Thm. IV.2 describes the existence of infinitely many incentive systems, which can achieve full decentralization. Interestingly, we find that an incentive scheme similar to this is being considered by the Ethereum foundation, and they also indicated that real identity management can be important . This fact is in accordance with our results.
V Impossibility of Full Decentralization in Permissionless Blockchains
In the previous section, we showed that blockchain systems can be fully decentralized under an appropriate Sybil cost function , where the Sybil cost represents additional costs for a player running multiple nodes when compared to the total cost for multiple players each running one node. In order for a system to implement the Sybil cost function, we can easily consider real identity management in which a trusted third party (TTP) manages real identities of players. When real identity management exists, it is certainly possible to implement the Sybil cost. However, the existence of TTP contradicts the concept of decentralization, and thus we cannot adopt such identity management for a good decentralized system. Currently, it is not yet known how permissionless blockchains where such identity management does not exist can implement the Sybil cost. In fact, many cryptocurrencies are based on permissionless blockchains, and many people want to design permissionless blockchains by their nature. Unfortunately, as far as we know, currently, the Sybil cost function of all permissionless blockchains is zero. Therefore, considering this fact (i.e., ), in this section, we study whether blockchains without Sybil costs can reach good decentralization.
V-a Almost Impossible Full Decentralization
To determine if it is possible for a system without a Sybil cost to reach full decentralization, we describe the below theorem for which the proof is presented in Appendix B.
Consider a system without a Sybil cost (i.e., ). Then the probability for the system to reach -decentralization is always less than or equal to
where is the set of all systems satisfying GR-, ND-, and NS-.
GR- means that all nodes can earn net profit, and satisfaction of both ND- and NS- indicates that all players run only one node without delegating. The above theorem implies that the maximum probability for a system satisfying GR-, ND-, and NS- to reach -decentralization is equal to the global maximum probability. Moreover, according to Thm. V.1, there is a system satisfying GR-, ND-, NS-, and ED- if and only if there is a system that converges in probability to -decentralization. As a result, it is sufficient to determine the maximum probability for a system satisfying GR-, ND-, and NS- to reach -decentralization. Therefore, we first find a utility function that satisfies GR-, ND-, and NS- through the following lemma.
When the Sybil cost function is zero, GR-, ND-, and NS- are met if and only if
This lemma shows that the utility function is linear for given the total resource power of nodes, and players would run one node with their own resource power under this utility (i.e., net profit) because delegation of their resource and running multiple nodes are not more profitable than running one node with their resource power. The proof for this lemma is presented in Appendix C.
Then we consider whether Eq. (7) can satisfy ED-. Note that the fact that ED- is met means that the probability to reach -decentralization is 1. Therefore, it is sufficient to answer the following question: “What is the probability of a system defined by Eq. (7) to reach -decentralization?” Thm. V.3 states the answer by providing the upper bound of probability. The proof of Thm. V.3 is presented in Appendix D. Before describing the theorem, we introduce several notations. Given that players start to run nodes in the consensus protocol at different times in practice, would be different depending on the time. Thus, we use notations and to reflect this, where is defined as:
In other words, indicates the set of all players who possess above the -th percentile effective power at time Moreover, we define and as
where denotes the time at which player starts to participate in a consensus protocol. The parameter indicates the initial resource power of the richest player among the players who remain in the system for a long time. Moreover, represents the ratio between the -th percentile and largest initial resource power of players who remain in the system for a long time. Considering these notations, we present the below theorem.
When Sybil cost function is zero, the following holds for any incentive system:
where and are 0. Specifically, the function is defined as Eq. (36).
This theorem implies that the probability of reaching -decentralization is less than . Here, note that represents the maximum resource power that a player can increase for time unit. Given that the upper bound would be smaller when the rich-poor gap in the current state is larger. In addition, the fact that implies that the more resource power the richest player possesses than the maximum value that a player can increase for time unit, the smaller the upper bound.
To determine how small is for a small value of , we adopt a Monte Carlo method. This is because it requires a large complexity to directly compute a value of . Fig. 1 represents the value of in regard to and when is 0.1. Note that means that a state should reach -decentralization in which the effective power of the richest is equal to the -th percentile. In addition, the fact and indicates that the effective power of the richest is 10 times, 100 times, and 1000 times the -th percentile in -decentralization, respectively.
Fig. 1 shows that the probability to reach -decentralization is smaller when and are smaller. Through Fig. 1, one can see that the value of is significantly small for a small value of . This result states that the probability to reach good decentralization is close to 0 if there is a large gap between rich and poor and the resource power of the richest is large (i.e., the ratio is not large333The ratio does not need to be small.). The values of when is and are represented in Appendix E, and the values are certainly smaller than that presented in Fig. 1.
To determine how small the ratio is at present, we use the hash rate of all users in the Slush mining pool  in Bitcoin as an example. We find miners with hash rate less than 3.061 GH/s and greater than 404.0 PH/s as of the time of writing. Referring to these data, we can see that the ratio (i.e., the ratio between resource power of the poorest and richest players) is less than about In addition, we observe that 15-th percentile and 50-th percentile hash rates are less than 5.832 TH/s and 25.33 TH/s, respectively. Therefore, ratio and are less than approximately and respectively. This example indicates that the rich-poor gap is significantly large. Moreover, we observe an upper bound of in the Bitcoin system. Given that the block reward is 12.5 BTC (), the maximum value of is approximately 384 TH. This maximum value can be derived, assuming that a player reinvests all earned reward to increase his hash power. Then an upper bound of would be , and this value is certainly less than the value of 0.1 used in Fig. 1. As a result, Thm. V.3 implies that, currently, it is almost impossible for a system without Sybil costs to reach good decentralization.
V-B Intuition and Implication
Here, we describe intuitively why a permissionless blockchain cannot reach good decentralization. In fact, because a player with great wealth can possess more resources, the initial distribution of resource power in a system significantly depends on the distribution of wealth in the real world when the system does not have any constraint of participation and can attract many players. Therefore, if wealth is equally distributed in the real world and many players are incentivized to participate in the consensus protocol, full decentralization can be easily achieved even in permissionless blockchains where anyone can join without any permission process. However, according to many research papers and statistics, the rich-poor gap has been significant in the real world [8, 9, 10]. In addition, the wealth inequality is well known as one of the most glaring deficiencies in today’s capitalism, and resolving this problem is quite difficult.
In the permissionless blockchain, players can freely participate without any restriction, and large wealth inequality would initially appear. Therefore, for the system to have good decentralization, its incentive system should be designed to gradually narrow the rich-poor gap. To this end, we can consider the following incentive system, which gradually narrows the huge rich-poor gap: In the system, nodes receive net profit in proportion to a square root of their resource power on average (e.g., Eq. (5)). In Section IV-C, we have already proved that this incentive system can make the resource power distribution among nodes more even, implying that it satisfies ED-. However, this alone cannot satisfy NS- when there is no Sybil cost (i.e., ). Therefore, to satisfy NS-, we can establish that the expected net profit decreases when the number of existing nodes increases. For example, in Eq. (5) can be a decreasing function of the number of existing nodes. In this case, players with large resources would not run Sybil nodes because when they do so, their utilities decrease by increasing the number of nodes. However, this approach has a side-effect, which leads players to delegate their power to a few players because they can earn higher profits as this rational behavior decreases the number of existing nodes. As a result, the above example intuitively describes that the four conditions are contradictory when the Sybil cost does not exist,444This does not imply the impossibility of full decentralization. It only represents that the probability to reach full decentralization is less than 1. and whether the permissionless blockchain can achieve good decentralization completely depends on how wide the gap is between the rich and poor in the real world. This fact is supported by Thm. V.3.
On the other hand, if we can find out how to implement the Sybil costs in permissionless blockchains, which do not have real identity management, we would be able to design the permissionless blockchain reaching good decentralization. We leave this as an open problem.
V-C Question and Answer
In this section, to further clarify the implications of our result, we present questions that academic reviewers or blockchain engineers have considered in the past and provide answers to them.
[Q1] “Sybil attacks are when one physical node claims multiple identities but creating more identities does not increase your mining power, so why is this a problem?” Firstly, note that decentralization is significantly related to real identities. In other words, when the number of independent players in a system is large and power distribution among them is even, the system has good decentralization. In this paper, we do not claim that the more Sybil nodes exist, the lower decentralization level is. We simply assert that a system should be able to know the current power distribution among players to reach good decentralization, and the system without real identity management can know the distribution when each player runs only one node. Moreover, we prove that, to reach good decentralization as much as possible, all players should run only one node (Thm. V.1).
[Q2] “Would a simple puzzle for registering as a block-submitter not be a possible sybil cost, without identity management?” According to the definition of Sybil cost (Section III), the cost to run one node should depend on whether a player runs another node. More specifically, the cost to run one node that a player with other nodes should pay should be more expensive than that for a player with no other nodes. Therefore, the proposed scheme cannot give Sybil cost. Again, note that the Sybil cost described in this paper is different from that usually mentioned in PoW and PoS systems .
[Q3] “If mining power is delivered in proportion to the resources one has available (which would be an ideal situation in permissionless systems), achievement of good decentralization is clearly an impossibility. This seems rather self-evident.” Naturally, a system would be significantly centralized in the initial state because the rich-poor gap is large in the real world and only a few players may be interested in the system at the early stage. Considering this fact, our work studies whether there is a mechanism, which causes a system to achieve good decentralization. Note that our goal is to reduce the gap between the effective power of the rich and poor, not the gap between their resource power. In other words, even if the rich possess significantly large resource power, the decentralization level can be high when the rich participate in the consensus protocol with only part of their resource power and so not large effective power. To this end, we can consider a utility function, which is a decreasing function for a large input (e.g., a concave function). However, this function cannot still achieve good decentralization because it does not satisfy NS- Note that, under a mechanism satisfying the four conditions, a system can always reach good decentralization whatever the initial state is. Unfortunately, our result states that there is no mechanism satisfying the four conditions, which implies that the probability to reach good decentralization is less than 1. To make matters worse, Thm. V.3 states that the probability is upper bounded by a value close to 0. As a result, this implies that it is almost impossible for us to create a system with good decentralization without any Sybil cost, even if enough time is given.
[Q4] “I think when the rich invest a lot of money in a system, the system can become popular. So, if the large power of the rich is not involved in the system, can it become popular?” In this paper, we focus on the decentralization level in a consensus protocol, which performs a role as a government of systems. Therefore, good decentralization stated in this paper implies a fair government rather than indicating that there is no rich and poor in the entire system. If the rich invest a lot of money in business (e.g., an application based on the smart contract) running on the system instead of the consensus protocol, the system may have a fair government and become popular. Indeed, the efforts to make a fair government also appear in the real world because people are extremely afraid of an unfair system where the rich influence government through a bribe.
Vi Protocol Analysis
In this section, to determine if what condition each system satisfies or not, we extensively analyze the incentive systems of the top 100 coins according to the four conditions. Based on this analysis, we can find out whether each system can have sufficient independent players and the even distribution of effective power among the players. This analysis also describes what each blockchain system needs for good decentralization.
Vi-a Top 100 Coins
Before analyzing the incentive systems based on the four conditions, we classified the top 100 coins in CoinMarketCap  according to their consensus protocol. Most of them use one of the following three consensus protocols: PoW, PoS, and DPoS. Specifically, there exist 44 PoW, 22 PoS, and 11 DPoS coins. In addition, there are 15 coins that use other consensus protocols such as Federated Byzantine Agreement (FBA), Proof of Importance, Proof of Stake and Velocity , and hybrid. Furthermore, we classify five coins including XRP , NEO , VeChain , Ontology , and GoChain  into permissioned systems. This is because in these systems, only players that are chosen by the coin foundation can run nodes in the consensus protocol. Finally, there exist one token, Huobi Token, and two cryptocurrencies that are non-operational, BitcoinDark and Boscoin. Table II summarizes the classification of top 100 coins described above.
|PoW||Bitcoin (1)  , Ethereum (2)  , Bitcoin Cash (4)  , Litecoin (7)  , Monero (9)  , Dash (10)  , IOTA (11)  , Ethereum Classic (13)  , Dogecoin (18)  , Zcash (19)  , Bytecoin (21)  , Bitcoin Gold (22)  , Decred (25)  , Bitcoin Diamond (26)  , DigiByte (28)  , Siacoin (33)  , Verge (34)  , Metaverse ETP (35)  , Bytom (36)  , MOAC (43)  , Horizen (47)  , MonaCoin (51)  , Bitcoin Private (52)  , ZCoin (56)  , Syscoin (60)  , Electroneum (61)  , Groestlcoin (64)  , Bitcoin Interest (67)  , Vertcoin (70)  , Ravencoin (71)  , Namecoin (72)  , BridgeCoin (74)  , SmartCash (75)  , Ubiq (77)  , DigitalNote (82)  , ZClassic (83)  , Burst (85)  , Primecoin (86)  , Litecoin Cash (90)  , Unobtanium (91)  , Electra (92)  , Pura (96)  , Viacoin (97)  , Bitcore (100) ||44|
|PoS||Cardano (8)  , Tezos (15)  , Qtum (24)  , Nano (29)  , Waves (31)  , Stratis (37)  , Cryptonex (38)  , Ardor (42)  , Wanchain (44)  , Nxt (50)  , PIVX (57)  , PRIZM (63)  , WhiteCoin (76)  , Blocknet (79)  , Particl (80)  , Neblio (81)  , BitBay (87)  , GCR (89)  , NIX (93)  , SaluS (94)  , LEO (98)  , ION (99) ||22|
|DPoS||EOS (5)  , TRON (12)  , Lisk (20)  , BitShare (27)  , Steem (32)  , GXChain (48)  , Ark (49)  , WaykiChain (68)  , Achain (84)  , Asch (88)  , Steem Dollars (95) ||11|
|Others||Stellar (6)  , NEM (16)  , ICON (30)  , Komodo (39)  , ReddCoin (40)  , Hshare (41)  , Nebulas (53)  , Emercoin (54)  , Elastos (55)  , Nexus (58)  , Byteball Bytes (59)  , Factom (62)  , Skycoin (69)  , Nexty (66)  , Peercoin (73) ||15|
|Permissioned||XRP (3) , NEO (14)  , VeChain (17)  , Ontology (23)  , GoChain (65) ||5|
|Token||Huobi Token (45)||1|
|Not working||BitcoinDark (46) , Boscoin (78)||2|
Next, we analyze the blockchain systems of the top 100 coins according to the four sufficient conditions. In this study, we focus on the analysis of coins using PoW, PoS, and DPoS algorithms, which are major consensus mechanisms of non-permissioned blockchains, to identify which conditions cannot be currently satisfied in each system. If a system satisfies both GR- and ND-, we can expect that many players participate in its consensus protocol and run nodes. In addition, if the system satisfies both NS- and ED-, the effective power would be more evenly distributed among the players. Table III represents the results of analysis, where the black circle (), half-filled circle (), and empty circle () indicate full, partial, and non-satisfaction of the corresponding condition, respectively. In addition, we mark each coin system with a triangle (▲) and an X (✗) when it partially implements and does not implement a Sybil cost, respectively. Here, partial Sybil cost means that multiple nodes run by one player can avoid paying the Sybil cost by pretending that they are run by different players (i.e., players who have different real identities). Note that PoW, PoS, and DPoS coins cannot have perfect Sybil costs because they are non-permissioned blockchains. In fact, it is currently not known how Sybil costs are implemented in blockchain systems without real identity management. We present detailed analysis results in the following sections.
Vi-B1 Proof of Work
Most PoW systems are designed to give nodes a block reward in proportion to the relative computational power of each node to the total power. In addition, there exist electric bills that are dependent on the computational power and other costs associated with running a node, such as a large memory for storage of blockchain data, which is independent of the computational power. Considering these facts, we can express a utility (i.e., an expected net profit) of node as follows.
In Eq. (9), represents the block reward (e.g., 12.5 BTC in the Bitcoin system) that a node can earn for a time unit, and and represent the electric bill per computational power and the other costs needed during the time unit, respectively. In particular, the cost is independent of the computational power. The values of the three coefficients, and determine whether the four conditions are satisfied.
|Coin name||Con 1||Con 2||Con 3||Con 4||