A central problem in blockchain systems is that of block proposal: how to choose which block, or set of transactions, should be appended to the global blockchain next. Many blockchains use a proposal mechanism by which one node is randomly selected as leader (or block proposer). This leader gets to propose the next block in exchange for a token reward—typically a combination of transaction fees and a freshly-minted block reward, which is chosen by the system designers. These reward mechanisms incentivize nodes to participate in the block proposal procedure, and are therefore critical to the security and liveness of the system.
Early cryptocurrencies, including Bitcoin, overwhelmingly used a leader election mechanism called proof of work
(PoW). Under PoW, all nodes execute a computational puzzle. The node who solves the puzzle first is elected leader; she proves her leadership by broadcasting a solution to the puzzle before the other nodes. Over the years, PoW showed itself to be extremely robust to security threats, but also extremely energy-inefficient. The Bitcoin network alone is estimated to use more energy than some developed nations.
An appealing alternative to PoW is called proof-of-stake
(PoS). In PoS, proposers are not chosen according to their computational power, but according to the stake they hold in the cryptocurrency. For example, if Alice has 30% of the tokens, she is selected as the next proposer with probability 0.3. Although the idea of PoS is both natural and energy-efficient, the research community is still grappling with how to design a PoS system that provides security while also incentivizing nodes to act as network validators. Part of incentivizing validators is simply providing enough reward (in expectation) to compensate their resource usage. However, it is also important to ensure that validators are treated fairly compared to their peers. In other words, they cannot only be compensated adequately on average; the variance also matters.
This observation is complicated in PoS systems by a key issue that does not arise in PoW systems: compounding. Compounding means that whenever a node (Alice) earns a proposal reward, that reward is added to her account, which increases her chances of being elected leader in the future, and increases her chances of reaping even more rewards. This leads to a rich-get-richer effect, causing dramatic concentration of wealth.
For example, consider what would happen if Bitcoin were a PoS system. Bitcoin started with an initial stake pool of BTC, and the block reward was fixed at BTC/block for several years. Under these conditions, suppose a party starts with of the stake. Using a basic PoS model described in Section 2, ’s stake would evolve according to a standard Pólya urn process 
, converging almost surely to a random variable with distribution Beta, (blue solid line in Figure 1). In this example, compounding gives a high probability of accumulating a stake fraction near 0 or 1. This is highly undesirable because the proposal incentive mechanism should not unduly amplify or shrink one party’s fraction of stake. Notice that this is not caused by an adversarial or strategic behavior, but simply due to the randomness in the PoS protocol, combined with compounding.
In PoW, on the other hand, the analogue would be for party to hold of the computational power. In that case, ’s stake after
blocks would be instead binomially distributed with mean(black dashed line in Figure 1). Notice that the binomial (PoW) stake distribution concentrates around as , so if contributes of stake at the beginning, she also reaps of the rewards in the long term. A natural question is whether we can achieve this PoW baseline distribution in a PoS system with compounding.
We study this question from the perspective of the block reward function. Most cryptocurrencies today use a constant block reward function like Bitcoin’s, which remains fixed over a long timespan (e.g., years). We ask how a PoS system’s choice of block reward function can affect concentration of wealth, and whether one can achieve the PoW baseline stake distribution simply by changing the block reward function. This paper has five main contributions:
We define the equitability of a block reward function, which intuitively captures how much the fraction of total stake belonging to a node can grow or shrink (under that block reward function), compared to the node’s initial investment. An equitable block reward scheme should limit this variability. This metric allows us to quantitatively compare reward functions.
We introduce an alternative block reward function called the geometric reward function
, whose rewards increase geometrically over time. We show that it is the most equitable PoS block reward function, by showing that it is the unique solution to an optimization problem on the second moment of a time-varying urn process; this optimization may be of independent interest in the applied probability community. We further show that geometric rewards exhibit a number of desirable properties, including stability of rewards in fiat value over time. We note that despite optimizing equitability, geometric rewards do not achieve the PoW baseline stake distribution – this is theinherent price paid by the efficiency afforded by PoS compared to (the energy inefficient) PoW. The green histogram in Figure 1 illustrates the empirical, simulated stake distribution when geometric rewards are awarded over a duration of blocks, and the total rewards are the same as in the PoW example (i.e., units). These simulations are run over 100,000 trials.
Borrowing ideas from resource pooling in PoW systems, a plausible strategy of participants with small stakes in a PoS system is to collectively form larger stake pools. We quantify exactly the gain of such stake pool formation in terms of equitability, which proves that participating in a stake pool can significantly reduce the compounding effect of a PoS system.
We study the effects of strategic behavior (e.g. selfish mining) on the rich-get-richer phenomenon. We find that in general, compounding can exacerbate the efficacy of strategic behavior compared to PoW systems. However, these effects can be partially mitigated by carefully choosing the amount of block reward dispensed over some time period relative to the initial stake pool size.
Our analyses of the equitability of various reward functions provide guidelines for choosing system parameters—including the initial token pool size and the total rewards to dispense in a given time interval—to ensure equitability under a given block reward function. In particular, we show that cryptocurrencies that start with large initial stake pools (relative to the block rewards being disseminated) can mitigate the concentration of wealth, both for constant and geometric reward schemes.
The rest of this paper is organized as follows. In Section 2, we present our model, and discuss the relation between it and real PoS cryptocurrencies. We also precisely define the constant and geometric block reward functions. In Section 3, we compare honest and geometric block reward schemes, showing that geometric rewards exhibit optimal equitability over all reward schemes. The resulting design decisions in choosing practical parameters of PoS system block reward schemes are discussed in Section 4. We use Section 5 to study the effects of strategic behavior on equitability – we find that neither constant nor geometric rewards provide robustness against selfish-mining-type attacks. The desired robustness to strategic behavior in PoS systems is perhaps designed via suitable incentive (and disincentive) mechanisms, as discussed in Section 6.
1.1 Related work
The potential of poor equitability of PoS systems has been explored in some detail in recent forum and blog posts in the cryptocurrency community [23, 19, 28], but no research has formally or quantitatively studied it (to the best of our knowledge). In this work, we quantify concentration of wealth through a new metric called equitability, which enables us to mathematically compare PoS to PoW, as well as different block reward schemes. As we discuss in Section 2, equitability is closely tied to the variance of a block reward scheme. Thus far, researchers and practitioners have reduced variance in block rewards through two main approaches: pooling resources (e.g., mining or stake pools) and proposing new protocols for disseminating block rewards.
Resource pooling is a common phenomenon in cryptocurrencies. For example, since PoW mining requires substantial computational resources, few nodes are independently able to mine profitably. Mining pools democratize this process by allowing many nodes to participate in mining, while also sharing block rewards among those nodes [26, 9]. In PoS systems, the analogous concept is stake pooling, where nodes aggregate their stake under a single node; block rewards are shared across the pool. Like mining pools, stake pools allow less wealthy players to participate in network maintenance. In Section 3.3, we show how much one can gain by participating in a stake pool in terms of equitability, and show that the proposed geometric reward function is still the most equitable even if some of the parties involved are forming stake pools.
Consider a different scenario where some reward is deterministically dispensed to every participant of a PoS system at each block proposal according to some predeclared rules. In particular, the block proposer is treated no differently from any other participating party. There is no randomness in this system and hence no compounding effect. Under this assumptions,  studies a problem of an organic stake formation, where any participant is allowed to create a stake pool, where she acts as a leader of the pool at some cost. The PoS system designer can choose a reward function to be appleid to each pool, where is the total stake of the pool, is the stake that the leader holds, and is to be shared equally over all participants of the pool according to their stake, and is awarded to the leader. The goal of the system designer is to organically form a fixed target number of stake pools, by choosing the reward function. Our work differs from  in three main respects: (1) While our work aims to optimize equitability,  aims to incentivize the formation of a target number of mining pools. (2) We study the effects of compounding on concentration of wealth, whereas  does not model compounding. (3) We study the dynamic setting as opposed to static setting.
A second class of approaches for reducing variance actually changes the protocol for block reward allocation. Our work falls into this category. Two main examples of this approach are Fruitchains , which spread block rewards evenly across a sequence of block proposers, and Ouroboros , which rewards nodes for being part of a block formation committee, even if they do not contribute to block proposal. Both of these approaches were proposed in order to provide incentive-compatibility for block proposers; they do not explicitly aim to reduce the variance of rewards. However, they implicitly reduce variance by spreading rewards across multiple nodes, thereby preventing the randomized accumulation of wealth. In our work, instead of changing how block rewards are disseminated, we change the block reward function itself.
2 Models and Notation
We provide a probabilistic model for the evolution of the stakes under a PoS system, and introduce a measure of fairness, we call equitability.
2.1 A Simple PoS model
We begin with a model of a chain-based proof-of-stake system with parties: . We assume that all parties keep all of their stake in the proposal stake pool, which is a pool of tokens that is used to choose the next proposer. We consider a discrete-time system, , where each time slot corresponds to the addition of one block to the blockchain. In reality, new blocks may not arrive at perfectly-synchronized time intervals, but we index the system by block arrivals. For any integer , we use the notation . For all , let denote the total stake held by party in the proposal stake pool at time . We let denote the total stake in the proposer stake pool at time , and denotes the fractional stake of node at time :
For simplicity, we normalize the initial stake pool size to ; this is without loss of generality as the random process is homogeneous in scaling both the rewards and the initial stake by a constant. Each party starts with fraction of the original stake.
At each time slot , the system chooses a proposer node such that
Upon being selected as a proposer, appends a block, or set of transactions, to the blockchain, which is a sequential list of blocks held by all nodes in the system. As compensation for this service, receives a block reward of stake, which is immediately added to its allocation in the proposer pool. That is,
The reward is freshly-minted at each time step, so it causes the total number of tokens to grow. We assume the total reward dispensed in time period is fixed, such that .
2.2 Modeling Assumptions
This model implicitly makes several assumptions. The first is that we assign a single leader (proposer) per time slot. Many cryptocurrencies have leader election protocols that allow more than one proposer to be chosen per time slot (e.g., Bitcoin, PoSv3, Snow White). If two leaders are elected at time , for example, then each leader can append its block to one block at height ; here the height of a block is its index in the blockchain. However, in these systems, only one leader can win the block reward since only one fork of the blockchain ultimately gets adopted. Assuming the final winner is chosen uniformly at random from the set of selected leaders, the dynamics of our Markov process remain unperturbed.
Other cryptocurrencies (e.g., Qtum, Particl) choose the next proposer(s) as a function of the time slot and the preceding block. Again, this can lead to multiple proposers per time slot. This does not affect our results in the honest setting (for the same reason as above), but it does impact strategic behavior. In most blockchain systems, honest proposers always build on the head of the blockchain. However, in systems where the proposer’s identity depends on the previous block, a strategic node can increase its chance of being a leader by appending to a block that is not at the head of the blockchain. If done repeatedly, the strategic player may eventually produce a chain that is longer than the honest chain (causing the honest nodes to switch over), which also contains mostly blocks belonging to the strategic player. This increases the player’s reward, and is called a grinding attack. Such PoS systems are more vulnerable to strategic behavior than the system we analyze, where proposer election is a function of only the time slot. Despite this, we find that our model is drastically vulnerable to strategic behavior. Hence, the problem can only be worse in blockchains that use block contents to choose the next leader. We discuss the implications of this in Section 6.2.
We have also assumed that users always re-invest their rewards into the proposer stake pool. We maintain that this is a reasonable assumption for two reasons: (1) In PoS systems where users explicitly deposit stake, existing implementations automatically deposit rewards back into the stake pool. For example, the reference implementation of Casper the Friendly Finality Gadget (a PoS finalization mechanism proposed for Ethereum) automatically re-allocates all rewards back into the deposited stake pool . (2) In other PoS systems, the stake pool is simply the set of all stake in the system, and is not separate from the pool of tokens used for transactions . Hence as soon as a proposer earns a reward, that reward is used to calculate the next proposer (modulo some maturity period); the user is not actively re-investing block rewards—it just happens naturally.
Finally, we have chosen not to explicitly model node unavailability, e.g. due to hardware or network failures; in our context, node unavailability means that a selected proposer may forfeit its chance to propose, even though it was chosen. Assuming such node failures occur i.i.d. across draws from the proposer pool, such events do not alter our model dynamics. If a proposer is offline, the selection process is simply re-run; the slot in question is given to the next node, which is again chosen proportionally to the stake allocation in the proposer pool.
2.3 Block reward choices
Many cryptocurrencies have modeled their block reward strategy after Bitcoin’s, which fixes the total supply of coins at about 21 million coins. To achieve this, block rewards are halved every 210,000 blocks (approximately four years) . In between halving events, the block reward remains constant. Figure 3 illustrates this reward schedule in terms of our notation; if we let and denote the th block interval and total reward, respectively, we can take blocks, and . Several cryptocurrencies have similarly adopted block reward schemes that remain constant over extended periods of time, including Ethereum , ZCash , Dash , and Particl . Note that choosing and is not our main focus; these parameters will likely be chosen based on economic considerations (we discuss this in Sections 3, 4 and 5).Below we aim to provide guidelines on how to choose once the ’s and ’s are fixed.
Other cryptocurrencies have experimented with the block reward function. For example, Monero 
has a block reward that decays with each block; this is intended to be a continuous interpolation of Bitcoin’s piecewise constant reward function. Peercoin , one of the first PoS cryptocurrencies, chooses the next leader based on the age and quantity of stake associated with a given public key. The PoS block reward is chosen as 1% of the product of a public key’s stake quantity and stake age ; this differs from our model, where the block reward amount does not depend on the proposer who is selected.
2.4 Systematic choice of reward functions
In this paper, we revisit the question of how to choose . A key observation is that is ultimately an incentive; it should compensate nodes for the resource cost of proposing blocks. Since this cost is roughly constant over time, many cryptocurrencies implicitly adopt the following maxim:
On short timescales, each proposed block should yield the same block reward.
Notice that this maxim does not specify whether the value of a block reward is measured in tokens or in fiat currency. As illustrated earlier, most cryptocurrencies today measure value in tokens; that is, they give the same number of tokens for each block. We call this approach the constant block reward function:
A natural alternative is to measure the block reward’s value in fiat currency. This approach depends closely on the cryptocurrency’s valuation (and fluctuations thereof) over time interval . However, if we assume that the cryptocurrency’s valuation is constant over , then the resulting reward function should always give a constant fraction of the total stake at each time slot. We call this the geometric reward function, defined as follows:
Figure 3 shows geometric block rewards as a function of time if we use the same ’s and ’s as those in Figure 3, which were tailored to Bitcoin’s block reward schedule. Note that a currency’s valuation can change over the course of blocks; these parameters were chosen simply to ease the comparison with Figure 3. Our assumption that valuation remains constant can be enforced by choosing a small enough value of . We discuss this parameter choice in Section 4, but in short, we envision being on the order of a day. Since is measured in units of blocks, this implies anywhere between thousands to tens of thousands of blocks per time interval.
To compare different reward functions, we define a metric called equitability. Consider the stochastic dynamic of the fractional stake of a party that starts with fraction of the initial total stake of . We denote the fractional stake at time by , to make the dependence on the reward function explicit.
One desirable property of a PoS block reward function is that each node’s fractional stake should remain constant over time in expectation. If contributes 10% of the proposal stake pool at the beginning of the time, then should reap 10% of the total disseminated rewards on average. Since randomness in proposer elections is essential to current PoS systems, this cannot be ensured deterministically. Hence, a straw-man metric for quantifying fairness is the expected fractional stake at time . This metric turns out to be meaningless because most PoS systems elect a proposer (in Eq (1)) with probability proportional to the fractional stake; this approach ensures that each party’s expected fractional reward is equal to its initial stake fraction, regardless of block reward function. Formally, ,
This follows from the law of total expectation and the fact that
Although all reward functions yield the same expected fractional stake, the choice of reward function can nonetheless dramatically change the distribution of the final stake, as seen in Figure 1. We therefore instead propose using the variance of the final fractional stake, , as an equitability metric. Intuitively, smaller variance implies less uncertainty and therefore a higher level of equitability. We make this formal in the following definition.
For a positive vector, we say a reward function over time steps is -equitable for where , if
for all .
For two reward functions and with the same total reward, , we say is more equitable than for player if
when both random processes start with the same initial fraction at each party of .
The normalization in Eq. (5) ensures the left-hand side is at most one, as we show in Remark 1. It also cancels out the dependence on the initial fraction such that the left-hand side only depends on the reward function and the time , as shown in Lemma 1.
When starting with an initial fractional stake , the maximum achievable variance is
where the supremum is taken over all positive integer and reward function .
We first prove the converse, for all and . This follows from the fact that , and is bounded below by zero and above by one. Maximum variance is achieved when all probability mass is concentrated on the boundary of zero and one.
We prove the achievability, by constructing a simple constant reward function, with total reward is increasing super-linearly in . From the variance computation of a constant reward function in Eq. (22), it follows that .
Let , then
(Proof in Appendix 0.A)
If reward function over time steps is -equitable for vector where , then is also -equitable, where
with denoting the vector of all ones.
As such, the remainder of this paper will study equitability from the perspective of a single (arbitrary) party . We will also describe reward functions as -equitable as shorthand for -equitable, where .
Note that even if the total reward is fixed, equitability can differ dramatically across reward functions. In the example of Figure 1, the constant reward function is -equitable. On the other hand, the geometric reward function of Eq. (3) has smaller chance of losing all its fractional stake (i.e. close to zero) or taking over the whole stake (i.e. close to one). It is -equitable in this example.
3 Equitability under Honest Behavior
In this section, we analyze the equitability of different block reward functions, assuming that every party is honest, i.e. follows protocol, and the PoS system is closed, so no stake is removed or added to the proposal stake pool over a fixed time period . Each party’s stake changes only because of the block rewards it earns and compounding effects. We discuss the effects of strategic behavior in Section 5, and open systems in Section 6.1.
The metric of equitability leads to a core optimization problem for PoS system designers: given a fixed total reward to be dispensed, how do we distribute it over the time to achieve the highest equitability? Perhaps surprisingly, we show that this optimization has a simple, closed-form solution.
For all and , the geometric reward defined in (3) is the most equitable among functions that dispense tokens over time , jointly over all parties , for .
Intuitively, geometric rewards optimize equitability because they dispense small rewards in the beginning when the stake pool is small, so a single block reward cannot substantially change the stake distribution. The rewards subsequently grow proportionally to the size of the total stake pool, so the effect of a single block remains bounded throughout the time period. We emphasize that the geometric reward function only depends on , , and , and in particular does not depend on how the initial stake is distributed among the participating parties. Hence, it is universally most equitable for all parties in the system simultaneously.
3.1 Proof of Theorem 1
for all such that and for all . To this end, we prove that is a unique optimal solution to the following optimization problem:
Using Lemma 1, we have an explicit expression for . After some affine transformation and taking the logarithmic function of the objective, we get an equivalent optimization of
This is a concave maximization on a (rescaled) simplex. Writing out the KKT conditions with KKT multipliers and , we get :
Among these solutions, we show that is the unique optimal solution, where is a vector of all ones. Consider a solution of the KKT conditions that is not . Then, we can strictly improve the objective by the following operation. Let denote two coordinates such that and . Then, we can create by mixing and , such that for all and . We claim that achieves a smaller objective function as . This follows from Jensen’s inequality and strict concavity of the objective function. Hence, is the only fixed point of the KKT conditions that cannot be improved upon.
In terms of the reward function, this translates into and .
The geometric reward function does not only optimize equitability for a single time interval. Consider a sequence of checkpoints, where is increasing in , and denotes the amount of reward to be disbursed between time and (inclusive). These checkpoints could represent target inflation rates on a monthly or yearly basis, for instance. A natural question is how to choose a block reward function that optimizes equitability over all the checkpoints jointly. The solution is to iteratively and independently apply geometric rewards over each time interval, giving a block reward function like the one shown in Figure 3.
Consider a sequence of checkpoints . Let . The most equitable reward function is
for . (Proof in Appendix 0.B)
Notice that when there is only one checkpoint, Theorem 2 simplifies to Theorem 1. This result implies that checkpoints can be chosen adaptively, i.e., they do not need to be fixed upfront to optimize equitability. One practical concern with concatenating checkpoints in this manner is that the change in block rewards before and after a checkpoint can be dramatic, as seen in Figure 3. This could cause other problems, such as proposers leaving the system. Hence a PoS system need not choose its block reward function based on equitability alone; it could also consider smoothness and/or monotonicity constraints, for instance. We leave such investigations to future work. Because of composition, we assume a single checkpoint for the remainder of this paper.
3.3 Equitability of Stake Pools
The participants have the freedom to form stake pools, as explored in [26, 9, 5]. We show in this section that stake pool formation reduces the variances of the fractional stake of all the members of the pool, and characterize exactly how much one gains. Consider a single party which owns fraction of the stake at time . We know from Lemma 1 that the variance at time is
Consider a case where the same party now participates in a stake pool, where the pool has of the initial stake (including the contribution from party ), and every time the stake pool is awarded a reward for block proposal, the reward is evenly shared among the participants of the pool according to their stakes. The stake of party under this pooling is denoted by , and it follows from Lemma 1 immediately that
The party ’s variance reduces by a factor of by joining a stake pool of size . For example, if everyone in the system form a single pool, then there is no randomness left and the variance is zero. Note that the variance is monotonically decreasing under stake pooling. In practice, stake pools can organically form as long as this gain in equitability exceeds the cost involved in forming such stake pools. Applying the definition 1 to a single party , equitability of a party improves by a factor of in the sense that -equitable party will achieve -equitability by forming a stake pool. Further, geometric reward function is still the most equitable reward function under the more general setting where the proposers are free to form stake pools. This follows from the fact that the effect of pooling is isolated from the effect of the choice of the reward function in Eq. (17),
4 Practical Parameter Selection
The equitability of a system is determined by four factors: the number of block proposals , choice of reward function , initial stake of a party , and the total reward . We previously saw that geometric rewards optimize equitability over choices of the reward function; in this section, we study the dependence of equitability on , , and . Recall that without loss of generality, we normalized the initial stake to be one. For general choices of , the total reward should be rescaled by . The evolution of the fractional stakes is exactly the same for one system with and and another with and . Although these parameters may be chosen according to external considerations (e.g. interest rates, proposer incentives), we assume in this section that the system designer is free to choose the total reward , either by setting the initial stake size and/or by setting the total reward during . We study how equitability trades off with the total reward for different choices of the reward function. Concretely, we consider a scenario where and are fixed and is a large enough integer, and ask how many tokens we can dispense while maintaining a desired level of equitability .
4.1 Geometric reward function
For , we have . It follows from Lemma 1 that
When is fixed and we increase , we can distribute small amounts of rewards across and achieve vanishing variance. On the other hand, if increases much faster than , then we are giving out increasing amounts of rewards per time slot and the uncertainty grows. This follows from the above variance formula, which we make precise in the following.
For a closed PoS system with a total reward chosen as a function of and a geometric reward function , it is sufficient and necessary to set
in order to ensure -equitability asymptotically, i.e. to ensure that
This follows from substituting the choice of in the variance in Eq. (18), which gives
The limiting variance is monotonically non-decreasing in and non-increasing in , as expected from our intuition. For example, if is fixed, one can have the initial stake as small as and still achieve a vanishing variance. As the geometric reward function achieves the smallest variance (Theorem 1), the above is the largest reward that can be dispensed while achieving a desired normalized variance of in time (with initial stake of one). This scales as . We need more initial stake or less total reward, if we choose to use other reward functions.
4.2 Constant reward function
Again, this is monotonically non-decreasing in and non-increasing in , as expected. The following condition immediately follows from Eq. (22).
For a closed PoS system with a total reward chosen as a function of and a constant reward function , it is sufficient and necessary to set
in order to ensure -equitability asymptotically as grows.
By choosing a constant reward function, the cost we pay is in the size of the total reward, which can now only increase as . Compared to of the geometric reward, there is a significant gap. Similarly, in terms of how small initial stake can be with fixed total reward , constant reward requires at least . This trend gets even more extreme, for a decreasing reward function.
4.3 Decreasing reward function
Some cryptocurrencies use continuously decreasing reward functions. For instance, Monero dispenses block rewards as per
for some constants , , , and . In practice, , , and . Monero itself is not a PoS cryptocurrency, but if this decreasing reward were applied to the PoS setting, it would have even higher variance than constant rewards. Consider a simpler choices of the constants such that
for all , for a choice of and . Recall that . As we assume , it follows after some calculations that and . It follows from Lemma 1 that
4.4 Comparison of Reward Functions
For a choice of and , Figure 5 illustrates the normalized variance for the three reward functions as a function of , the number of blocks over which the reward is dispensed. As expected, variance decays with and geometric rewards exhibit the lowest normalized variance. Similarly, for a fixed desired (normalized) variance level of , Figure 5 shows how much the total reward can grow as a function of time . Notice that under constant rewards, the reward allocation grows linearly in , whereas geometric rewards grow subexponentially fast while still satisfying the same equitability constraint.
These observations add nuance to the ongoing conversation about how to initialize cryptocurrency tokens that are not considered securities from a regulatory perspective. In the 2018 class action lawsuit of Coffey vs. Ripple , one of the primary complaints against Ripple was the fact that “all 100 billion of the XRP in existence were created out of thin air by Ripple Labs at its inception.” Our results suggest that in a PoS system, a large initial stake pool can actually help to ensure equitability.
5 Strategic Behavior
In reality, proposers can be strategic to maximize their rewards. The most well-known strategic attack is selfish mining, proposed by Eyal and Sirer in the context of proof-of-work , and extended in [25, 20]. In selfish mining, adversarial miners who discover blocks do not immediately publish them; rather they build a private side-chain of blocks. By eventually releasing a side chain that is longer than the main chain, the adversary can override blocks that were mined by the honest party. This has two effects: first, it gives the adversary a greater fraction of blocks in the main chain (and hence, block rewards) than they would get by mining honestly. Second, it forces honest parties to waste effort mining blocks that have a low chance of being accepted in the long term. Although selfish mining refers to a specific strategy that was designed for a PoW system, the concept of building side chains for increased profit can be applied to PoS systems as well. In this section, we show that such strategic attacks are exacerbated by the compounding effects of PoS. Contrary to the scenario where everyone behaves honestly, we empirically show that geometric reward functions do not mitigate the effects of compounding when strategic actors are present.
We restrict ourselves in this section to two parties: , which is adversarial, and , which is honest. Note that this is without loss of generality, as represent the collective set of multiple honest parties as their behavior is independent of how many parties are involved in . The adversarial party can also represent the collective set of multiple adversarial parties, as having a single adversary is the worst case when all adversaries are colluding. Throughout this section, we use the terms adversarial and strategic interchangeably, to refer to the party that strategically deviates from honest behavior.
Since does not always publish its blocks according to schedule, we distinguish the notion of a block slot (indexed by ) and wall-clock time (indexed by ). It will still be the case that each block slot has a single leader —in practice, this is determined by a distributed protocol—and a new block slot leader is elected at every tick of the wall clock (i.e., at a given time , is only defined for ). However, due to strategic behavior (i.e., the adversary can withhold its own blocks and override honest ones), it can happen that no block occupies slot , even at time ; moreover, the occupancy of block slot can change over time. Thus, unlike our previous setting, if we wait time slots, the resulting chain may have fewer than blocks. This is consistent with the adversarial model considered in PoS systems (like Ouroboros ) that elect a single leader per block slot. Other PoS systems, like PoSv3 , choose an independent leader to succeed each block; such a PoS model can lead to even worse attacks, which we do not consider in this work.
The honest party and the adversary have two different views of the blockchain, illustrated in Figure 6. Both honest and adversarial parties see the main chain ; we let denote the block (i.e., leader) of the th slot, as perceived by the honest nodes at time . If a block slot does not have an associated block at time (either because the th block was withheld or overridden, or because ), we say that . Notice that due to adversarial manipulations, it is possible for and , and vice versa.
In addition to the main chain, the adversary maintains arbitrarily many private side chains, , where denotes the number of side chains. The blocks in each side chain must respect the global leader sequence . An adversary can choose at any time to publish a side chain, but we also assume that the adversary’s attacks are covert: it never publishes a side chain that conclusively proves that it is keeping side chains. For example, if the main chain contains a block created by the adversary for block slot , the adversary will never publish a side chain containing block , where is also associated with block slot .
Each side chain with overlaps with the honest chain in at least one block (the genesis block), and may diverge from the main chain after some (Figure 6). That is,
Different side chains can also share blocks; in reality, the union of side chains is a tree. However, for simplicity of notation, we consider each path from the genesis block to a leaf of this forest as a separate side chain, instead of considering side trees. We use and to denote the chain length of and , respectively, at time :
and we use the heights and to denote the block indices of the th and th blocks, respectively:
If , then the adversary is building its th side chain from the tip of the current main chain.
5.1.1 State space.
The state space for the system consists of three pieces of data:
The current time
The main chain
The set of all side chains
Notice in particular that the set of side chains grows exponentially in . In practice, most systems prevent the main chain from being overtaken by a longer side chain that branches more than blocks prior to ; this is called a long-range attack. Hence we can upper bound the size of the side chain set by imposing the condition that for all , . Regardless, the size of the state space is considerably larger than it is in prior work on selfish mining in PoW , where the computational cost of creating a block forces the adversary to keep a single side chain.
The adversary ’s goal is to maximize its fraction of the total stake in the main chain by the end of the experiment,
This objective is closely related to the metric of prior work , except for the finite time duration.
5.1.3 Strategy space.
The adversary has two primary mechanisms for achieving its objective: choosing where to append its blocks, and choosing when to release a side chain. If the honest party is elected at time , by the protocol, it always builds on the longest chain visible to it; since we assume small enough network latency, appends to block . However, if is elected at time , can append to any known block in . The system must allow such a behavior for robustness reasons: even an honest proposer may not have received a block or its predecessors due to network latency.
The adversary can also choose when to release blocks. In our model, always releases its block immediately when elected. However, an adversarial proposer elected at time can choose to release its block at any time ; it can also choose not to release a given block. Late block announcements are also tolerated because of network latency; it is impossible to distinguish between a node that releases their blocks late and a node whose blocks arrive late because of a poor network connection.
Notice that if is elected at time and chooses to withhold its block, the system advances to time without appending ’s block to the main chain. This means that the next proposer is selected based on the stake ratios at time . So the adversary may have incurred a selfish mining gain from withholding its block, but it lost the opportunity to compound the block reward. This tradeoff is the main difference between our analysis and prior work on selfish mining attacks in PoW systems.
The adversary matches by choosing a side chain and releasing the first blocks. This means the released chain has the same height as the honest chain. In accordance with [9, 25], we assume that after a match, the honest chain will choose to build on the adversarial chain with probability , which captures how connected the adversarial party is to the rest of the nodes.
The adversary overrides by choosing a side chain and releasing the first blocks. The released chain becomes the new honest chain.
If the adversary chooses to wait, it does not publish anything, and continues to build on all of its side chains.
Unlike [9, 25], we do not explicitly include an action wherein the adversary adopts the main chain. Because our model allows the adversary to keep an unbounded number of side chains, adopting the main chain is always a suboptimal strategy; it forces the adversary to throw away chains that could eventually overtake the main chain. The primary nuance in the adversary’s strategy is choosing when to match or override (rather than waiting), and which side chain to choose. Identifying an optimal mining strategy through MDP solvers as in  is computationally intractable due to the substantially larger state space in this PoS problem. Hence, in the following sections, we will discuss specific strategies that can increase the adversary’s reward.
5.2 Strategic selfish mining
We show that the adversary can gain significantly by acting strategically, and this gain is exacerbated by the effect of compounding. Similar to selfish mining strategies originally introduced under PoW settings, an adversary can build side-chains to potentially take over the main chain. First critical difference is that an adversary can build arbitrarily many side chains branching from anywhere in the main-chain without additional cost (other than the memory required to store those side-chains). In PoW, this is prevented by the computational power required to create each additional side-chain. Secondly, the block rewards are also withheld for those adversarial blocks held aside to build side-chains. Under compounding, delaying the rewards of such side-chains costs the adversary in the following proposer elections, as the adversary is that much less likely to be elected a leader.
An adversary needs to devise a strategy that balances the gain in keeping a long side-chain and potentially over taking a long main-chain, and the loss in those intermediate leader elections due to the withheld rewards. We propose a family of strategic schemes that we call Match-Override- (MO-). Under MO- strategy, the adversary only keeps side-chains of length at most .
Concretely, the adversary uses the following strategy. Every time a new honest block is generated, this is appended to the main chain. At this point, the following actions are taken by the adversary. If there is a side-chain such that and , then the adversary matches with the side-chain with the smallest , in which case the main chain length does not change, and the side chain remains, and all other side chains are discarded. Otherwise, if there is no such chain to match, then the main chain remains as is, the adversary waits, and any side-chain such that are discarded, as those side-chains are too short and have little chance of taking over the main chain. This action is known as adopt in the original selfish mining strategy.
Every time a new adversarial block is generated, the adversary appends this block to every side-chain she is managing currently. She also starts a new side-chain branching from the top of the main chain, if there is not a side-chain at the top already. At this point, one of the following actions are taken by the adversary. If there is only one side-chain and it satisfies both and , then the adversary overrides with this , in which case the main chain increments by one adversarial block, and the side chain remains. Otherwise, the main chain remains as is, and the adversary waits.
In Figure 7, we can see that how much the adversary can gain in expected fractional stake, by using MO- strategies. As the total reward increases, the relative fractional stake approaches , which is the maximum achievable value as the expected fractional stake is normalized by . When the adversary is well connected to honest nodes, such that , such attacks are effective with small length side-chains, such as or . Further, there is no distinguishable difference in the reward function used. On the other hand, when the adversary has equal chance of matching honest chains, such as , it is more effective to keep longer side-chains. Overall, the effect of strategic behavior is exacerbated by the compounding effects.
5.3 Upper bound
We assume a constant reward function where a reward of is dispensed to a proposer whose block is appended to the main chain. We begin with an upper bound on , the fraction of stake that can be achieved by the adversary.
To show our upper bound, we analyze a random process called always-match-1 (AM-1). AM-1 is an urn process with state
where as before, denotes the number of tokens held by party at time . and can be thought of as the honest and adversarial stake, respectively; compared to and , they evolve under different dynamics, which are described below. We let denote the fraction of the urn occupied by at time . At each tick of a discrete clock, the state is updated as follows:
Intuitively, if the honest wins a given draw, then the honest pool gains unit of reward. If the adversarial pool instead wins a given draw, it negates honest units, and adds units to the adversarial pool. The following theorem shows that AM-1 gives a universal upper bound on under any arbitrary strategic behavior by the adversary. We refer to Appendix 0.C for a proof.
Under the constant reward function, for any adversarial strategy resulting in a stake fraction time series , the AM-1 random process stochastically dominates , i.e. for all and any .
Figure 8 (left) shows that for small values of the total reward and when adversaries are well connected to the honest nodes (), the AM-1 upper bound is quite close to an achievable strategy of MO-4. The right panel show that when the adversaries are less connected (), then the strategic behavior takes over less stake. We analyze an upper bound (inspired by AM-1), which reveals that a PoS system is less vulnerable against strategic attacks when initial stake is larger.
5.4 Analytical upper bound
We introduce and analyze a new random process called always-match-2 (AM-2), which is an upper bound on AM-1, but has the merit that the expected fractional stake is tractable in a closed form.
Similar to AM-1, AM-2 is an urn process with state
where as before, denotes the number of tokens held by party at time . At each tick of a discrete clock, the state is updated as follows:
The addition of units of adversarial reward keeps the total change in urn size constant across time steps, which simplifies the analysis of this urn process. The following theorem shows that AM-2 gives an upper bound on the AM-1 process. We refer to Appendix 0.D for a proof.
Under the constant reward function, the AM-2 process stochastically dominates the AM-1 process.
We are interested in how much an adversary can gain by acting strategically. The above theorem provides a tool for characterizing an upper bound on any strategies, by analyzing AM-2. This is made formal in the following theorem. We refer to Appendix 0.H for a proof.
Let denote the fractional stake of the adversary under the AM-2 process, when the total initial stake is , initial fractional stake of the adversary is , and the total reward dispensed over time is . If , then
Under the assumption that is less than the stake of the honest party to ensure that honest party’s stake does not vanish to zero, the gain of an adversarial strategy over a honest strategy is bounded by , where we used the fact that when everyone is honest the mean fractional stake remains over all . This implies that having a small initial stake relative to the total reward makes the system vulnerable against adversarial strategies. This justifies the common practice of starting a PoS system with large initial stake.
Further, this analysis allows us to quantify the price of compounding under adversarial strategy. When there is no compounding effect, either under a PoW system or because the rewards are not automatically appended to the stake, an upper bound on adversarial strategy we consider in this paper has been analyzed in . Translating the bound into the same notations as in Theorem 5, we get that when there is no compounding, an adversary’s fractional stake is bounded by
with high probability. For large enough , with high probability. Compared to Eq. (29), when is large, compounding allows the adversary’s gain to grow linearly in whereas the adversary’s gain is a constant in with no compounding. This shows that strategic parties can gain significantly over honest parties, under PoS systems with compounding effects.
There are three main issues that relate to actually building a chain-based PoS system with geometric rewards. The first is how to choose the relevant parameters and , which has been discussed at length in Section 4. The second is how to deal with changing stake fractions that arise due to user-initiated transactions, e.g., selling their stake – discussed below in Section 6.1). The third discusses how to handle strategic behavior by block proposers in practice – discussed here in Section 6.2).
6.1 Dynamic Proposer Stake
One challenge in the analysis of PoS systems is the fact that stake can move rapidly between parties, e.g. if nodes choose to sell their stake. Computing the objective function in the optimization of equitability is tedious when accounting for the dynamic addition and removal of stake, and it is not clear that geometric rewards are robust to rapid stake transactions. However, in practice, PoS systems often restrict the timescale over which stake can be added or removed, precisely to add robustness. For example, Casper FFG constrains users to keep their stake in a validation pool for at least months in order to participate [6, 24].
In our system, an analogous stability constraint would be to impose that stake ratios should not change during each time interval of blocks. If this constraint is met, then geometric rewards can be recalculated at each block interval to account for dynamically changing stake pool. Theorem 2 implies that this strategy optimizes the overall equitability of the reward scheme, even if the stake transactions are not known a priori. Moreover, if we choose on the order of days as suggested in Section 4, this constraint is relatively mild from a user’s perspective. It is important to note that users need not explicitly deposit their funds into a common pot in order to enforce the proposed stability constraints. This can be enforced implicitly by programming the selection mechanism to only consider stake that has been associated with the same public key for some minimum time interval. Such a strategy has been suggested in several proposed PoS systems, including Ouroboros , Algorand , Casper [6, 24].
6.2 Control selfish mining
Strategic behavior is a significant concern in PoW cryptocurrencies [9, 20, 25], and even more so in PoS systems. In Section 5 we demonstrate the efficacy of a strategic attack through which a rational user can artificially boost her proportion of the block rewards. In a sense, the results from Section 5 are negative. Choosing a small reward (with respect to the initial fraction) at each time step does not fully solve the problem, and there may be economic reasons to give out larger block rewards within a given time period.
Ultimately, we expect that this problem cannot be solved solely by changing the block reward function. Rather, it may be more effective to control the effects of strategic behavior than to identify a scheme under which strategic behavior is equivalent to honest behavior. For instance, the proposer selection protocol could choose only proposers whose fraction of proposed blocks in the last blocks is commensurate with the proposer’s stake (within some statistical error). Such a policy would detect nodes who produce more than their fair share of blocks, and limit their ability to propose more blocks.
In this work, we study the effects of compounding and the choice of block reward function on the concentration of wealth in PoS cryptocurrencies. We measure this concentration of wealth through a proposed metric called equitability, which captures the (normalized) variance of parties’ stake distributions after a fixed epoch ofblocks. We show that existing block reward functions (such as constant and decreasing rewards) have poor equitability. We introduce a new reward function, which we call geometric rewards, and prove that this is the most equitable block reward function. The negative effects of compounding, i.e. the unfair distribution of wealth, can be further mitigated by choosing initial system parameters judiciously: that is, by ensuring that the total block rewards disseminated in each epoch should be small compared to the initial stake pool size.
Several open questions remain. First, our results assume that proposers do not add or remove stake in the middle of an epoch. Such stake dynamics are likely to affect the optimality of geometric rewards and complicate the computation of equitability. Although we can disallow the addition or removal of stake on short timescales (e.g., a day), systems that choose epochs on the order of years will need to deal with dynamic stake pools.
Another challenge, which we discuss in Section 6, is that geometric rewards may not be desirable in practice because of the sharp changes in block rewards between epochs. A natural solution is to impose smoothness or monotonicity constraints on the class of reward functions. Solving such an optimization is an interesting direction for future work.
Finally, a substantial open problem is that of protecting against strategic players. Although strategic players are not specific to PoS systems or compounding, we show here that geometric rewards alone do not protect against strategic players. Designing incentive-compatible consensus protocols for strategic players is a major question in blockchain systems. Some papers that make progress on this front include Fruitchains  and Ouroboros Hydra ; both works propose reward and consensus mechanisms for which honest behavior is shown to be a -approximate Nash equilibrium. As discussed in Section 1.1, the algorithms of these papers may inherently improve equitability by spreading block rewards over multiple parties. Formally analyzing those protocols through the lens of equitability is another direction for future research.
The authors gratefully acknowledge support from the Distributed Technologies Research Foundation, the National Science Foundation under grant CCF 1705007, and the Army Research Office under grant W911NF1810332.
-  Bitcoin energy consumption index, 2018. https://digiconomist.net/BITCOIN-ENERGY-CONSUMPTION.
-  Controlled supply. bitcoinwiki, 2018. https://en.bitcoin.it/wiki/Controlled_supply#cite_note-2.
-  Mining. Ethereum Wiki, 2018. https://github.com/ethereum/wiki/wiki/Mining.
-  Anderson, L., Holz, R., Ponomarev, A., Rimba, P., and Weber, I. New kids on the block: an analysis of modern blockchains. arXiv preprint arXiv:1606.06530 (2016).
-  Brünjes, L., Kiayias, A., Koutsoupias, E., and Stouka, A.-P. Reward sharing schemes for stake pools. arXiv preprint arXiv:1807.11218 (2018).
-  Buterin, V., and Griffith, V. Casper the friendly finality gadget. arXiv preprint arXiv:1710.09437 (2017).
-  Duffield, E., and Diaz, D. Dash: A privacycentric cryptocurrency. Self-published (2015).
-  Earls, J. The missing explanation of proof of stake version 3, 2017. http://earlz.net/view/2017/07/27/1904/the-missing-explanation-of-proof-of-stake-version.
-  Eyal, I., and Sirer, E. G. Majority is not enough: Bitcoin mining is vulnerable. Communications of the ACM 61, 7 (2018), 95–102.
-  Hopwood, D., Bowe, S., Hornby, T., and Wilcox, N. Zcash protocol specification. Tech. rep., Technical report, 2016–1.10. Zerocoin Electric Coin Company, 2016.
Johnson, N. L., and Kotz, S.
Urn models and their application: an approach to modern discrete probability theory, vol. 77. Wiley New York, 1977.
-  Kaiser, I. A decentralized private marketplace: Draft 0.1.
-  Kiayias, A. How does casper compare to ouroboros?, 2018. https://iohk.io/blog/how-does-casper-compare-to-ouroboros/.
-  Kiayias, A., Russell, A., David, B., and Oliynykov, R. Ouroboros: A provably secure proof-of-stake blockchain protocol. In Annual International Cryptology Conference (2017), Springer, pp. 357–388.
-  King, S., and Nadal, S. Peercoin–secure & sustainable cryptocoin. Aug-2012 [Online]. Available: https://peercoin. net/whitepaper ().
-  Mahmoud, H. Pólya urn models. Chapman and Hall/CRC, 2008.
-  Micali, S. Algorand: the efficient and democratic ledger. CoRR, abs/1607.01341 (2016).
-  Miller, A., Möser, M., Lee, K., and Narayanan, A. An empirical analysis of linkability in the monero blockchain. arXiv preprint 1704 (2017).
-  moh_man. How does pos stake concept deal with rich becoming richer issue? Reddit, 2017. https://www.reddit.com/r/ethereum/comments/6x0xv8/how_does_pos_stake_concept_deal_with_rich/.
-  Nayak, K., Kumar, S., Miller, A., and Shi, E. Stubborn mining: Generalizing selfish mining and combining with an eclipse attack. In Security and Privacy (EuroS&P), 2016 IEEE European Symposium on (2016), IEEE, pp. 305–320.
-  Pass, R., and Shi, E. Fruitchains: A fair blockchain. In Proceedings of the ACM Symposium on Principles of Distributed Computing (2017), ACM, pp. 315–324.
-  Pemantle, R. A time-dependent version of pólya’s urn. Journal of Theoretical Probability 3, 4 (1990), 627–637.
-  Rammeloo, G. The economics of the proof of stake consensus algorithm. Medium, 2017. https://medium.com/@gertrammeloo/the-economics-of-the-proof-of-stake-consensus-algorithm-e28adf63e9db.
-  Ryan, D., and Liang, C.-C. Hybrid casper ffg, 2017. https://github.com/ethereum/EIPs/blob/master/EIPS/eip-1011.md.
-  Sapirshtein, A., Sompolinsky, Y., and Zohar, A. Optimal selfish mining strategies in bitcoin. In International Conference on Financial Cryptography and Data Security (2016), Springer, pp. 515–532.
-  Schrijvers, O., Bonneau, J., Boneh, D., and Roughgarden, T. Incentive compatibility of bitcoin mining pool reward functions. In International Conference on Financial Cryptography and Data Security (2016), Springer, pp. 477–498.
-  Taylor-Copeland, J. Coffey vs. Ripple class action complaint. 2018.
-  Trustnodes.com. “proof of work is the rich get richer squared” says vitalik buterin. Trustnodes, 2018. https://www.trustnodes.com/2018/07/10/proof-work-rich-get-richer-squared-says-vitalik-buterin.
-  Van Saberhagen, N. Cryptonote v 2.0, 2013.
Appendix 0.A Proof of Lemma 1
Let and , then