1. Introduction
Numerous cryptocurrencies based on a peertopeer network now exist, utilizing an open ledger called a blockchain. In such blockchain systems, network nodes called miners verify the transactions collected through the network, generate a block consisting of valid transactions, and propagate the block to the entire network. In general, (public) blockchain systems offer financial incentives to encourage miners to participate in this process. For example, in Bitcoin, a miner is currently rewarded with 12.5 BTC for a new block creation.
The security and reliability of blockchain systems depend on their consensus algorithm. Most popular cryptocurrency systems such as Bitcoin and Ethereum adopt a proofofwork (PoW) mechanism in order to agree on the same blockchain (Proof, 2018). In such PoW mechanisms, miners must solve cryptographic puzzles (i.e., proofsofwork) showing that a certain amount of computational resources (e.g., time and memory) was spent in order to generate a new block. The mining difficulty is adjusted automatically to maintain an average mining rate of one block at a fixed time interval with changes in the total computational power of the blockchain system.
The significant increase in mining difficulty led the formation of mining pools, in which miners gather to mine together; these pools perform mining as a single node in a network. Most pools consist of a manager and miners, and the manager assigns a puzzle to miners at the beginning of every round. The miners then find partial proofsofwork (PPoWs) and full proofsofwork (FPoWs) for a given puzzle and submit them to the manager. PPoWs are needed to assess each miner’s contribution, to share the reward in the pool. If a miner fully solves the puzzle, the manager generates and propagates a block based on the submitted PoW. The manager then earns the reward for one block and splits it among miners in proportion to their number of submitted PPoWs.
However, previous studies (Rosenfeld, 2011; Kwon et al., 2017) demonstrated how existing mining pools’ protocols can be vulnerable to block withholding (BWH) (Rosenfeld, 2011) and fork after withholding (FAW) attacks (Kwon et al., 2017). Moreover, pools in high competition can launch these attacks against each other by infiltrating a part of their mining power (i.e., computational power) into other pools. A BWH attacking pool first infiltrates its mining power into other pools (i.e., victim pools) and submits only PPoWs but not FPoWs to the victim pools. The FAW attack is an extended attack of the BWH attack. Similar to the BWH attack, a FAW attacking pool also infiltrates its mining power into other pools, and submits only PPoWs but not FPoWs to the victim pools except the case when an external honest miner (i.e., neither the attacker nor a miner in the victim pool) propagates a block. Unlike the BWH attack, in the case, the attacker intentionally generates forks through the victim pools.
To analyze the strategic interaction between miners for those two attacks, we can use gametheoretic models: the BWH game and the FAW game. Eyal (Eyal, 2015) showed that the BWH game between two pools, where they can execute BWH attacks, is equivalent to the prisoner’s dilemma. For the FAW game between two pools, where they can execute FAW attacks, Kwon et al. (Kwon et al., 2017) showed that mining pools’ strategies form a Nash equilibrium in which a larger pool among two pools earns extra profit. Thus, FAW attacks between pools can cause more centralization of the system, because miners would join in large pools that earn the extra reward through the FAW attack. The objective of this study is to find strategies inducing a noattack state where FAW and BWH attacks do not occur.
System model: To achieve this goal, we take a more macroscopic view of the system by modeling longterm interactions between two pools as a notion of repeated game, which we call repeated FAWBWH game, and considering both FAW and BWH attacks together in one framework. In the repeated game, the FAWBWH game is repeated as a onestage game over time, where in every stage each rational pool makes a decision to cooperate or execute FAW or BWH attacks (see Section 3 for a full description of our model). Unlike previous studies (Eyal, 2015; Kwon et al., 2017) focusing on a single stage game, we model a game with multiple stages to analyze the effects of longterm interactions among pools. Also, we considered both FAW and BWH attack strategies while they considered a single attack strategy (i.e., FAW or BWH attack) only.
A novel strategy ARS:
In game theory, the
iterated prisoner’s dilemma (i.e., a repeated version of the prisoner’s dilemma) has been extensively studied in terms of how rational players can cooperate through retaliation (Axelrod et al., 1987). Using this retaliation concept, we find how rational pools can cooperate in the repeated FAWBWH game. Unlike the iterated prisoner’s dilemma, the repeated FAWBWH game leads to a situation where a larger pool always wins the game (i.e., the pool size game). This occurs when two pools execute FAW attacks against each other (i.e., the FAW game). Therefore, it is relatively underexplored to find a cooperationinducing strategy in such a situation. In this paper, we propose strategies called Adaptive Retaliation Strategies (ARS), which can lead to the state of noattack between two pools. In ARS, a pool cooperates at first and continues to cooperate, but executes attacks for retaliation only when attacked. Here, ARS must achieve a good balance between retaliation and selfishness because rational pools would consider their payoff even when they retaliate. In other words, if retaliation is costly, they would not follow ARS. This is done by ARS’ adaptive retaliation, i.e., adaptively deciding the amount of infiltration power for retaliation against FAW or BWH attacks.We formally describe ARS and prove that 1) ARS leads rational pools to cooperate and 2) pools are likely to adopt ARS, by using a popular concept of equilibrium in repeated game theory, called subgame perfect Nash equilibrium (see Section 4). Furthermore, our numerical analysis demonstrates that ARS makes the FAW and BWH attacks unprofitable (see Section 5).
Practical requirements for ARS:
To apply ARS in realworld settings, pools should be able to monitor other pools’ information about whether an attacker executes FAW or BWH attacks, and if so, how much the attacker’s infiltration power is used. To monitor these information, victim pool’s some statistical properties can be used. For example, a pool can detect an attack by observing its fork rate and the ratio between the number of submitted PPoWs and FPoWs in FAW and BWH, respectively. It can also determine the attacker’s infiltration power in the victim pool through this detection method. Another prerequisite for ARS is to identify which pools have attacked a victim pool, and this may take a long time. Therefore, we propose investigating the variance of the reward densities of pools through moles to reduce the identifying time (see Section
6). The benign pool’s reward density is not at all related with block rewards of other pools. Meanwhile, when a pool executes either FAW or BWH attacks, its reward density would be correlated with block rewards of other pools because the attacker earns part of its reward from the victim. This implies that we can identity the pools executing attacks based on this correlation information (see Section 6).Multiple pools: In addition, for generalization, we extend ARS to that for multiple pools beyond two pools. Through a case study, we show that ARS still makes FAW and BWH attacks unprofitable in the game among multiple pools (see Section 7).
2. Background
In current cryptocurrencies, peers verify transactions issued by clients. Peers record the verified transactions in a “blockchain.” To maintain the blockchain, many cryptocurrencies, including Bitcoin, adopt the PoW mechanism. In this section, we describe the mining process, focusing on Bitcoin. Further, we demonstrate FAW and BWH attacks against mining pools.
2.1. Bitcoin Basics
In cryptocurrencies, peers verify transactions issued by clients. Peers record the verified transactions in a “blockchain.” To maintain the blockchain, many cryptocurrencies adopt the PoWbased consensus mechanism.
Mining Process: For mining, miners first gather issued transactions, which are not yet recorded in the blockchain, into their local storage. Then, miners place the transactions into a block and find a PoW, spending their computational power to generate a valid block. The header of a block includes a Merkle root (Merkle, 1980) of transactions in the corresponding block and the hash value of the header of the previous block. The block header also includes a nonce, which is a key component necessary to become a valid block.
To be a valid block, the hash value of block header must be less than a given target number . In particular, Bitcoin uses the double SHA256 hash function. The hash value of a block header is obtained as an output of double SHA256 for an input containing a concatenation of block contents, including a nonce. Miners increment a nonce to find a valid nonce, to make the hash value less than the target number . If a miner finds a valid nonce as a PoW, the miner generates the valid block including the nonce and propagates the block to a peertopeer network. Finally, the block is appended to the blockchain, and the above mining process is repeated.
In Bitcoin, the target number is adjusted every 2016 blocks to keep the average period of one block generation (i.e, the average period of one round) at 10 mins. The smaller the value of , the more difficult the mining process will be.
Forks: When a miner propagates a block, another block can also be generated and propagated by a miner who has not yet received ’s block. Therefore, miners receive two blocks and regard the first received block as the blockchain head. This situation is called a fork. When a fork occurs, only one block becomes valid. Moreover, intentional forks undermine the Bitcoin security because doublespending (Pete Rizzo, 2018; Karame et al., 2012) and selfish mining (Eyal and Sirer, 2014; Nayak et al., 2016; Sapirshtein et al., 2015; Gervais et al., 2016) can be executed by generating intentionally forks.
Mining Pools: As the mining difficulty has been increasing, mining pools were introduced, in which miners gather to mine together. Major pools consist of a manager and many miners. These pools run as one node in the Bitcoin network, and pool miners only need to connect to the manager and create IDs. The manager forms and distributes a potential block to miners, and then miners spend their computational power to generate a valid nonce based on the block form provided by the manager. Moreover, there are open and closed pools, depending on the policy whether any miner can join or not.
Pools’ reward systems are different from the block reward system in Bitcoin. Miners in pools receive rewards for nonces, which make the hash value of the block header less than a new target number . The number is greater than the original target number . We refer to a nonce for the target and as a partial proofofwork (PPoW) and a full proofofwork (FPoW), respectively. When a miner finds a PPoW or an FPoW, the miner submits it (called shares) to the manager, where PPoWs are needed to assess each miner’s contribution in order to share the reward in the pool. When the submitted share is FPoW, the manager generates a valid block and earns the block reward. The manager then distributes the block reward to miners in proportion to the number of submitted shares.
2.2. Existing Attacks on Pools
Block Withholding: To execute the BWH attack (Rosenfeld, 2011; Courtois and Bahack, 2014), an attacker splits her computational power into two parts, which are used for solo mining and malicious mining in the victim pool, respectively. In the malicious mining, she submits only a PPoW to the victim, without submitting an FPoW. Although she undermines the victim by withholding blocks in the victim pool, she still earns a portion of the reward through PPoWs submitted to the victim. In addition, her solo mining has a higher efficiency compared to the case where she does not attack, because the block reward is gained in proportion to each pool’s relative computational power in Bitcoin. In other words, by undermining the victim pool, her relative computational power would increase. As a result, this point allows the attacker to earn an extra reward. Naturally, the BWH attack can be executed in many proofofwork cryptocurrencies, including Bitcoin, Ethereum, and Litecoin.
In 2015, Eyal (Eyal, 2015) developed a BWH game between two pools. In the game, each pool can launch the BWH attack against an opponent by infiltrating a part of the computational power into the opponent. Eyal found that the BWH game results in the miner’s dilemma. In other words, there is only one Nash equilibrium, in which two pools execute BWH attacks against each other and both pools suffer losses.
Fork After Withholding: For the FAW attack proposed in 2017 (Kwon et al., 2017), similar to the BWH attack, an attacker splits her computational power into two parts, which are used for her solo mining and malicious mining in a victim pool. However, when the attacker finds an FPoW in the victim pool, she submits the FPoW to the manager, unlike in the BWH attack. This occurs only if an external honest miner (i.e., neither the attacker nor a miner in the victim pool) propagates a block. Therefore, the attacker intentionally generates a fork through pools. In the FAW game, pools can launch FAW attacks against each other by infiltrating a portion of their computational power into the other pools. There is unique Nash equilibrium, where two pools execute the FAW attack against each other. In the equilibrium, a larger pool earns an extra reward (unlike in the BWH game) while a smaller pool suffers a loss. In other words, the game leads to a pool size game. This fact can make the decentralization level of Bitcoin decrease when occurring FAW attacks among pools.
Identification: Because no viable countermeasure against FAW and BWH attacks (Eyal, 2015; Kwon et al., 2017) exists, the attacks can be launched in practice. Indeed, the BWH attack was executed against the “Eligius” pool in 2014. To detect the attack, a pool’s manager can investigate the ratio of FPoWs to PPoWs. If the ratio is low enough, the manager can speculate that the BWH attack has occurred in his pool. However, identification of the attacker is known to be difficult if she executes BWH attacks using many Sybil nodes (IDs) in the pool.
For the FAW attack, miners can detect it because the attack will cause a high fork rate. However, it is difficult to identify the attacker because she can indirectly generate intentional forks through pools instead of generating them by herself. In the victim pool, the manager can expel any miners suspected of causing forks. Nonetheless, the attacker’s reward can be unaffected by this manager’s behavior by planting many Sybil IDs in the victim pool. In other words, even though an ID that generates a fork is expelled by the manager, the attacker still earns the extra reward though other IDs. Eventually, the manager would be unable to prevent FAW attacks with a simple blacklist of suspects.
3. Model and Formulation
3.1. System Model
Block generation in PoW:
In PoW mechanisms, miners attempt to generate valid blocks by finding an inverse image of a hash function satisfying a certain condition in each round, where one round is defined as the time during which a new task is generated and a valid block is found by a miner. Due to the pseudorandomness of hash functions, we assume that the number of blocks found by a miner for one round follows a Poisson distribution with the miner’s relative computational power. Then, the number of blocks found by a pool also follows a Poisson distribution, because the sum of Poisson random variables is a Poisson random variable. For simplicity, we assume that natural forks do not occur in the block generating process, as the probability (
0.004 (Gervais et al., 2016)) of natural forks occurring is significantly low in practice (Eyal, 2015; Gervais et al., 2016; Kwon et al., 2017).Computation power and reward: We let be the set of all pools and solo miners^{1}^{1}1A solo miner directly conducts mining, not joining pools., and denote by the computational power of . For analytical convenience, we normalize the total computation power with 1, and thus . We assume that a node cannot possess more than 50% computational power (i.e., ) as in the previous works. A reward for one block is assumed to be 1, implying that the total reward of a node in a round is simply the total number of blocks found by in that round. When a pool finds a block and earns the reward for the block, the pool manager distributes the reward to miners in proportion to the number of shares submitted for the time duration over which the pool finds the block. We also assume that the manager honestly distributes the reward among the pool’s miners. We define node ’s reward density at round as where is ’s reward earned for round
Attacks: We consider a case in which only two types of attacks, FAW and BWH, can be executed by pools. In addition, because most pools are open pools, we consider only open pools, not closed pools. Closed pools will be discussed in Section 8. For the worst case analysis, we assume that the FAW attack is executed under the best network capability^{2}^{2}2A variable represents the network capability (Kwon et al., 2017), where we assume in this study. This assumption, which is made for the sake of simplicity, can be relaxed readily., meaning that the attacker’s blocks always become valid after forks caused by the FAW attack. During an attack, we assume that an attacking pool executes either FAW or BWH attacks, but not both at the same time. Indeed, the case where an attack combining FAW and BWH is equivalent to the FAW attack under some network capability. An attacking pool infiltrates a part of its computation power into “victim” pools. Note that such a choice on the attack method can be timevarying (see stage in the next paragraph). We consider a regime in which there are a sufficient number of miners in each pool, so as to assume that each pool’s infiltration power used for attacks is a real number. Moreover, an attacker infiltrates its partial computational power into a victim pool using Sybil IDs in order to make it more difficult to identify who the attacker is. Nonetheless, we assume that the victim can trace the attacker and know how much infiltration power the attacker has used for attacks because the attacking pool is an open pool. In Section 6, we describe how this becomes possible in practice.
Stage: Our interest lies in investigating how pools interact over a longterm period. To this end, the entire time is divided into a sequence of stages, where over each stage a pool can know whether an attack occurs against itself, how much infiltration power is being used, and who the attacker is. This notion of stage is the one that is popularly used in repeated games (see Section 3.2). Therefore, at the end of each stage, a victim identifies an attacker when an attack was executed against the victim during the stage. Note that a stage consists of multiple rounds. At the start of each stage, pools can change their actions based on other pools’ actions. For analytical tractability, we assume that stages are synchronized among pools.
3.2. Repeated FAWBWH Game
Primer on repeated game: We aim at modeling how multiple pools interact to achieve their own objectives over a longterm period. To this end, we use the theory of repeated games, popularly used to understand the longterm interactions among players. In repeated games, the interactions among players repeat for multiple stages, and the players become aware of other players’ past behaviors and their future benefits, accordingly adapting their strategies over time. The main idea of the theory of repeated games is that a player may be deterred from exploiting her shortterm advantage by the threat of punishment that reduces her longterm payoff.
The basic component of a repeated game is a (simultaneousmove) stage game played for each stage. The stage game is represented by where is the set of players, is the set of actions of the player and is a player ’s payoff function when the players’ action profile is We denote by the period repeated game of with perfect information, where the players play the same stage game for stages, possibly . We use the superscript to express the stage in all notations, and let denote the action profile at stage i.e., the actions by all players at stage Also, we denote by the actions of player for stages. For let denote the history up to stage where is the space of all possible stage histories. Depending on whether or , we call each case as a finitely or infinitely repeated game. At each stage each player knows all past actions and chooses the next action according to ’s strategy, thus is determined. Here, a strategy for player in the repeated game is a sequence of maps —one for each stage —that map a history to an action in By perfect information, we mean that at the end of each stage game, players are able to know other players’ actions and their payoffs. In this paper, we focus on the infinitely repeated game, in which given the whole action profiles , the payoff of player in the corresponding repeated game is the discounted average payoff, i.e.,
(1) 
where is the discount factor. Next, we introduce the concept of the subgame perfect Nash equilibrium.
Definition 3.1 (Subgame Perfect Nash Equilibrium (SPNE)).
For a given history and player ’s strategy , we denote the discounted average payoff of player in the subgame given the history as . Then the strategy profile is a subgame perfect Nash equilibrium if
where is a space of player ’s strategies.
In the repeated game, SPNE is a stronger version of Nash equilibrium, roughly meaning a strategy profile, which is a Nash equilibrium in every
subgame. Thus, a SPNE is regarded as a mathematicallyproved strategy vector that rational players are likely to follow, when players interact over a longterm time period. There is also Folk Theorem
(Fudenberg and Maskin, 2009), which states that several SPNE outcomes can exist in a repeated game. More specifically, it represents that if there is a credible retaliation, it is likely to achieve cooperation when considering rational players, where a credible retaliation indicates that is not costly for a retaliator.Repeated FAWBWH game: As in the previous studies (Eyal, 2015; Kwon et al., 2017), for simplicity we consider a game only between two pools (Pool and Pool). As mentioned in Section 3.1, we define a stage as the duration of time when each pool is able to trace other pools’ attacking behaviors, which enables us to obtain the condition of perfect information in our analytical model.
In defining the repeated FAWBWH game, it is crucial to define a stage game for which we now describe how we model the set of actions and the payoff function First, each pool ’s action is defined in terms of which attack is performed and the amount of infiltration power used. We assume that a pool’s attack is homogeneous, i.e., it executes either FAW attack or BWH attack. However, a pool is able to change its attack across stages. Then, in this stage game, each pool ’s action space is noattack, FAW, or BWH, with a choice of some infiltration power for each attack. More formally, at stage , each pool ’s action can be expressed as a vector , where and are the infiltration powers for each FAW and BWH attack, respectively . It is clear that the case of corresponds to noattack, in which case we simply denote it by . Moreover, because we assume homogeneity in the attack, whenever only one of or is positive, i.e., Fig. 1 depicts a model of our repeated FAWBWH game between two pools. When a stage ends, each pool is aware of its opponent pool’s strategy () for this stage game. Then, pools can change their action, depending on the opponent’s action at the previous stage, when a new stage game starts. To complete the definition of the repeated FAWBWH game, it remains to define the payoff function in (1) at each stage game, which we will delay until Section 4.
4. Analysis of Repeated FAWBWH Game
In this section, we study how two pools choose their strategies in the equilibrium of the repeated FAWBWH game, to understand the pools’ behaviors in terms of their interaction over a long time.
4.1. Payoff Function in the Stage Game
Because pools play the same game at each stage, we remove the stage index in this section for simplicity. For an action profile of two pools, it seems natural to define the payoff as its extra reward density with respect to 1, i.e., where is the average earned reward in each round. As described in Section 3.2, each pool’s action is expressed as its infiltration power for either FAW or BWH attack, i.e., and where either of or is 0. Then, for convenience, we separately present four possible cases as follows: for a given profile
where we henceforth provide the forms of four functions: and According to the definition of payoff , if two pools do not attack, their payoffs are 0. If is positive, Pool earns an extra reward. Otherwise, Pool suffers a loss.
Homogeneous attacks: The case in which two pools execute the same attack, FAW or BWH, has been studied in two related studies (Kwon et al., 2017; Eyal, 2015). For the FAWFAW attack, the following payoff function can be obtained from Kwon’s work (Kwon et al., 2017).
(2) 
In (2), the first term is obtained from the honest mining of each pool, achieved with the computational power remaining after deducting the infiltration power. Note that Pool gets a reward of from the honest mining because each node earns a mining reward based on how many blocks it generated relative to others. The second term represents the extra reward density that is earned in the case where the opponent generates an intentional fork and Pool does not generate any block. In this case, both an external honest miner and the infiltration power of Pool generate a block, and the probabilities of these events are and , respectively. This derives the second term. The third term is from intentional forks caused by both Pool and Pool In this case, an external honest miner and infiltration powers of Pool and Pool find blocks, and then a fork with three branches occurs. If the infiltration power of Pool finds a block faster than that for Pool, its probability would be On the other hand, if the infiltration power of Pool generates a block faster than that for Pool, its probability would be Considering these facts, the third term is derived. Lastly, the fourth term is from its infiltration mining into the opponent and is derived from that the opponent distributes the reward of to Pool Note that Pool infiltrates the computational power of into the opponent.
Next, we consider when two pools execute BWH attacks against each other. In this case, we have the following form of the payoff function from Eyal’s work (Eyal, 2015), where forks are not intentionally generated so the second and third terms in (2) disappear in (3).
(3) 
Heterogeneous attacks: As opposed to the payoff functions in homogeneous attacks borrowed from previous studies (Kwon et al., 2017; Eyal, 2015), it still remains to establish the payoff functions for when two pools execute each FAW and BWH attacks. We first consider the case when Pool and Pool execute FAW and BWH attacks, respectively. Then, the payoff which quantifies the extra reward density, turns out to be given by:
(4) 
This payoff can be easily derived. First, the first term represents the earned reward density of Pool through Pool’s honest mining with the computational power remaining after deducting the infiltration power. The second term is obtained from Pool’s infiltration mining into the opponent, Pool. Note that (4) does not have any reward density term earned from generated forks in Pool because Pool does not generate forks in Pool.
Now, when Pool and Pool execute BWH and FAW attacks with infiltration power and , respectively, we have:
(5) 
Because only Pool executes the FAW attack, forks are generated by Pool in only Pool Thus, (5) is the addition of a similar form of (4) to the reward density (the second term) earned when forks occur in Pool.
4.2. Equilibrium at the Stage Game
We now discuss how two pools would behave at the equilibrium for each stage game, before we study how rational pools behave through longterm interactions in the repeated game. This step is of significant importance, because (i) it clearly shows how much a nearsighted view of pools’ interaction in each stage game (as in prior work (Eyal, 2015; Kwon et al., 2017)) differs from a farsighted one in the repeated games, and (ii) understanding the perstage equilibrium behaviors is a key to understanding what happens if such stage games are repeated among pools. This perstage equilibrium is stated in Theorem 4.1.
Theorem 4.1 (Nash equilibrium for stage game).
There exists a unique Nash equilibrium (NE) in the stage game; it is characterized as:
(6) 
Further, the following payoff values are obtained for different cases of two pools’ computational powers and :
(7)  
(8) 
(see Appendix A.1 for our proof of the theorem.) Theorem 4.1 states the existence and the uniqueness of the Nash equilibrium, which is technically meaningful in the sense that perstage equilibrium is predictably interpretable from a mathematical perspective. The major messages of Theorem 4.1 are: (i) when both pools are allowed to execute FAW and BWH attacks, at the NE, the two pools execute only FAW attacks (see (6)), and (ii) the larger pool always earns an extra reward, whereas the smaller pool always suffers a loss (see (7)). However, when they possess the same computational power, no additional reward is provided to both pools (see (8)). This is in stark contrast to the previous game where only BWH is allowed (Eyal, 2015). Note that Theorem 4.1 provides the first analysis of a scenario in which both BWH and FAW attacks are possible. Also, the actions at the NE and their resulting payoffs at the NE differ markedly from those in classical games such as the prisoner’s dilemma.
4.3. ARS (Adaptive Retaliation Strategies)
We now propose strategies that induce cooperation among two pools, i.e., noattack, which is provably verified in the framework of repeated games. In the classical repeated game theory, it is wellknown that “threat of future punishment” induces cooperation. We inherit such a rationale in our study; however, the following key differences are noted: (i) As mentioned in Section 4.2, the prisoner’s dilemma is played repeatedly in many studies, whereas our stage game significantly differs from the prisoner’s dilemma, and (ii) our stage game is also defined for a continuous action space, and thus, in punishing other pools deviating from cooperation, it is critically important to adaptively determine the amount of infiltration power for retaliation. As a result, considering the above facts, we should find a credible retaliation, which is necessary for inducing cooperation according to Folk Theorem.
4.3.1. Strategy description
In this paper, we denote by the resulting actions of two pools at stage . A given strategy of both pools would produce the sequence of actions We now describe special strategies, named ARS (Adaptive Retaliation Strategies), which call a subroutine Retaliate of Algorithm 2. Here Retaliate has infinitely many versions, depending on a parameter that we will describe in Retaliate subroutine paragraph. Therefore, we denote by ARS a strategy belonging to ARS, and ARS is represented in Algorithm 1. When playing ARS, an internal variable representing the standing of Pool is maintained by each pool; represents whether Pool has followed ARS well or not at the previous stage. We use the notation to refer to the action when ARS is played, in order to differentiate an action from a different strategy. Thus, when Pool plays ARS.
ARS: In ARS, each Pool starts to cooperate with noattack when , and initializes its standing variable to At each stage , Pool first sets its standing , depending on whether its stage action matches from ARS (S1). Thus, if Pool deviates from what ARS does at the stage , its standing at the stage is set to Then, different standing values of both pools lead to the following combinations: where ‘GOOD’ and ‘BAD’. To help readers better understand ARS, we present a state diagram of ARS in Fig. 2, where four states exist, depending on two pools’ standings; in each state, we also present the ARS’s action of Pool at stage Here, Rt is the output of Retaliate with (denoted by Retaliate in Algorithm 1), and denotes an action value that differs from . The action tuple at each edge (which may deviate from ARS) is what results in a state change, and we did not present the action tuples that do not change a state (e.g., in (G,G), the action does not incur the state change).
To summarize ARS, Pool starts with cooperation, and then retaliates when the opponent deviates from cooperation. However, if, as a response to Pool’s retaliation, the opponent goes back to cooperation, Pool stops retaliating, and resumes cooperation with its opponent. If the opponent is not back to cooperation (thus not following ARS) and keeps executing attacks, Pool, which follows ARS, also keeps retaliating. When two pools deviate from ARS simultaneously, both of them turn out to cooperate at the next stage. Retaliation phase is presented in Fig. 2, where Pool retaliates against its opponent (S2) only when the standing is (i.e., when Pool follows ARS but the opponent deviates from ARS). Considering this fact, at least one of and , which are two inputs of Retaliate, should be (see the actions at edges toward ). This is because (i) if , it indicates that the opponent’s standing at stage was and thus Pool should not attack according to ARS, i.e., is , or (ii) if , should be . After Pool’s retaliation, the opponent goes back to cooperation as a contrite behavior; the contrite phase is represented as a change from (G,B) to (G,G), where two pools cooperate (Fig. 2). In the (B,B) state, where two pools deviate from ARS, both of them cooperate at the next stage, making the transition to (G,G).
Note that ARS assumes that Pool has values of , , , and of its opponent; we will discuss how that information is available to Pool in Section 6. Indeed, there exists a strategy, contrite titfortat (CTFT) (Boyd, 1989), which uses standings similar to (not the same as) that for ARS and induces cooperation in the iterated prisoner’s dilemma. However, CTFT is studied as a strategy for the iterated prisoner’s dilemma with a discrete action space including only two actions, where ARS significantly differs from CTFT.
Retaliate subroutine: Prior to explaining Retaliate, we first introduce the notion of an infiltration power candidate set (or simply, infiltration set) as follows: for given pools’ local and opponent actions , and we define the infiltration set with respect to either FAW or BWH as the set of Pool’s infiltration powers that makes Pool’s attack unprofitable as a retaliating response to Pool’s deviation from cooperation. Formally, Pool’s infiltration set for the FAW attack is given as:
(15) 
where is an arbitrary number in . As gets close to 1, the retaliator tries to use FAW attacks as much as possible rather than BWH attacks. Similarly, we define for the BWH attack as:
(16) 
The main goal of Retaliate is to determine which attack to perform and how much infiltration power is needed to retaliate against the deviating opponent while maximizing the retaliator’s (longterm) payoff. Thus, the crux of Retaliate is to strike a good balance between retaliation and selfishness. To this end, we first prioritize FAW over BWH, simply because the FAW attack is known to be more profitable than the BWH attack (Kwon et al., 2017) (see S.1 and S.2, where S.1 is first attempt). We henceforth focus on the steps for retaliation with FAW (S.1), which is quite similar to that with BWH, where we first construct the FAWinfiltration set as in (15). In fact, it is possible for to be empty, and this occurs when the FAW attack has no effect of retaliation, in which case the retaliation with BWH is then tried (S.2). Note that the BWHinfiltration set (S.2.1) is provably nonempty. Intuitively, this is because BWH is known to have more strength in damaging the opponent more severely (Kwon et al., 2017). In Appendix A.3, we prove the nonemptiness of ).
Next, in balancing between retaliation and selfishness, we construct two filtered sets of infiltration powers, and (S.1.2), which consider retaliation and selfishness, respectively. In Pool following ARS computes the set of infiltration powers in proportion to the degree of Pool’s attack, i.e., generating the same amount of loss to Pool as that to Pool from Pool’s attack, which we call ”equal retaliation”. In the set of infiltration powers is constructed so as to maximize Pool’s payoff for the FAW attack, where is chosen to be closest to the infiltration power maximizing Pool’s payoff , expressed as:
(17) 
which is obtained from (Kwon et al., 2017). Finally, in S.1.3, Pool decides to retaliate by deciding between equal retaliation () and selfishness (). Retaliate chooses the minimum infiltration power for FAW in and , which is the minimum amount of power to achieve retaliation while considering its own payoff. Therefore, if Pool must use a significant portion of its computational power to infiltrate for equal retaliation, it instead maximizes its payoff rather than pursuing equal retaliation. Sets and , which are similar to and for the FAW attack, are constructed for the BWH attack, where is derived from (Eyal, 2015), given as:
(18) 
Note that Retaliate outputs if , in which case Pool does not need to retaliate. This occurs when the opponent did not attack at stage even though the opponent would retaliate against Pool at stage according to ARS. In this case, because the opponent did not follow ARS, the opponent’s standing would be BAD, where Pool would call Retaliate. However, Retaliate would usually output in this case, and Pool would not attack for retaliation because and would include 0 in most cases.
4.3.2. Equilibrium Analysis
Next, we prove that ARS is a subgame perfect Nash equilibrium for a sufficiently large .
Theorem 4.2 ().
There exists a function such that, for all discount factor the twopool strategy vector (ARS, ARS) is a subgame perfect Nash equilibrium. Function is always less than 1, and is an increasing function of and for given Moreover, (ARS, ARS) is a Nash equilibrium for all
A proof of Theorem 4.2 appears in Appendix A.2. In the proof of Theorem 4.2, we show that it is not more profitable for each player to deviate ARS at the start of any subgame when compared to the case where it follows ARS. This implies that ARS is a subgame perfect Nash equilibrium, according to onetime deviation property.
If two pools use one of ARS (their strategies need to be not necessarily the same), the strategy vector is a Nash equilibrium. Especially, if two pools use the same strategy, the strategy vector is a subgame perfect Nash equilibrium. As described in Section 3.2, a subgame perfect Nash equilibrium refines a Nash equilibrium by eliminating noncredible threats, which is a strategy vector that rational pools are actually unlikely to follow. In addition, a large value of implies a condition in which pools consider future payoffs significantly, or the probability that pools are patient enough to stay inside the system for a long time. Indeed, most pools, including Slush, Eligius and F2Pool, are operated for a long time in practice. A large value of is also better satisfied compared to the pools’ entire operation period when the duration of one stage is short. In Section 6, we explain that the duration of a stage period can be short, which supports the practical value of our analytical result.
Indeed, there are many other subgame perfect Nash equilibria (from Folk Theorem in repeated games (Fudenberg and Maskin, 2009)) in the repeated FAWBWH game, which trivially include the one that two pools always execute FAW attacks against each other. From a manager’s perspective, the manager would want to increase its pool size while earning extra rewards, until the pool increases to a size that does not seriously threaten the system. This is a good reason for the manager to execute the FAW attack. Meanwhile, it is unknown whether the subgame perfect Nash equilibria have cooperation between pools because the existence of credible punishments is unclear. Our results imply that cooperation can be stable when ARS is used, even though the FAWBWH game has a certain winner; i.e., a larger pool. Moreover, ARS includes infinitely many strategies with in (15). As such, there are infinitely many ways to achieve cooperation between pools. In addition, ARS restores cooperation even if FAW and BWH attacks impulsively occur, which is another advantage of ARS. For example, if Pool impulsively attacks, the opponent would retaliate. After that, Pool does not attack, being contrite, and two pools achieve the noattack status.
5. Numerical Analysis
In this section, we use a numerical analysis to demonstrate how much infiltration power each pool would use for retaliation in ARS in response to the opponent’s action. We simulate the repeated FAWBWH game with varying Pool and Pool’s sizes. We consider a case in which Pool deviates from ARS to attack Pool while maximizing its payoff during one stage. As a result, Pool would retaliate against Pool according to ARS. In this section, we consider a strategy ARS, where is close to 1.
Fig. 7 represents when Pool optimally executes the FAW attack to maximize its payoff. Then, Pool following ARS retaliates at the next stage. The and axes are Pool and Pool’s sizes, respectively. Moreover, we define infiltration ratios , where and are proportions of infiltration power and for Pool’s computational power, respectively (i.e., , ). Fig. (a)a represents Pool’s infiltration ratio for retaliation using the FAW attack. In the white region of Fig. (a)a, Pool cannot retaliate against Pool with the FAW attack. Thus, Pool should retaliate using the BWH attack. Fig. (b)b represents Pool’s infiltration ratio for retaliation using the BWH attack. Here, we can see that all cases are covered with the colored regions in Fig. (a)a and (b)b. Considering two stages where Pool first executes the FAW attack and then Pool retaliates, Fig. (c)c and (d)d represent average payoffs of Pool and Pool for two stages, respectively. That is, these figures show when we denote each of two stages by stage 0 and 1. As shown in Fig. (c)c, Pool’s average payoff is always negative, meaning that ARS makes FAW attacks unprofitable. Moreover, Fig. (d)d shows that Pool can completely recover a loss from Pool’s attack in the case where Pool retaliates with the FAW attack.
Fig. 12 represents when Pool optimally executes the BWH attack to maximize its shortterm payoff. Similar to Fig. 7, Fig. (a)a and (b)b represent Pool’s infiltration ratio and , respectively. Fig. (c)c and (d)d respectively represent the average payoffs of Pool and Pool for two stages. Fig. (c)c shows that Pool always suffers a loss from the retaliation of Pool when Pool executes the BWH attack. Therefore, it shows that ARS makes BWH attacks unprofitable.
As a representative scenario, we simulate the repeated FAWBWH game in terms of various Pool’s infiltration ratio used for attacks, assuming that Pool’s size is 0.2 (20%). Fig. 17 and 21 represent Pool’s execution of FAW and BWH attacks, respectively. The and axes are Pool’s infiltration ratio used for attack and Pool’s sizes, respectively. Fig. (a)a and (b)b show the infiltration ratio and for retaliation against Pool, respectively. Because the extent of retaliation by ARS depends on the loss caused by the opponent’s attack, Pool’s infiltration ratio for retaliation depends on Pool’s attack infiltration ratio. Fig. (c)c and (d)d represent the average payoffs of Pool and Pool, respectively, for two stages in which Pool executes the FAW attack and then Pool retaliates. Pool always suffers a loss by deviating from ARS because all colors in Fig. (c)c indicate negative values. Meanwhile, there are some cases in which Pool can earn extra profit in the process of retaliation, as shown in Fig. (d)d. Similar to Fig. 17, Fig. 21 shows Pool’s infiltration ratio for retaliation and, the attacker’s and victim’s average payoffs for two stages when Pool executes the BWH attack. In most cases, Pool chooses the FAW attack rather than the BWH attack for retaliation. Even though there exist some cases to execute the BWH attack in response to ARS, we omit Pool’s infiltration ratio for retaliation with the BWH attack because the region of such cases is very small (see small areas bounded by black bold lines at leftbottom corners in Fig. 21). As a result, BWH attacks become unprofitable by ARS.
Name 
against FAW 
Total Payoff 
against BWH 
Total Payoff 

AntPool 

1.89% 

0.78% 
ViaBTC 

0.54% 

0.15% 
DPool 

0.004% 

1.1% 
Bixin 

0.025% 

0.63% 
Also, we consider the current power distribution obtained from Blockchain.info (Blockchain Info, 2018). We assume that BTC.com, which is the largest pool as of Jan 2019 and has a computational power of about 25%, optimally executes FAW and BWH attacks against each of four pools (AntPool, ViaBTC, DPool, and Bixin), which have respective computational powers of 15%, 10%, 3.5%, and 2%. In this case, four pools would retaliate against BTC.com according to ARS. Table 1 represents the infiltration ratio , which Pool uses for retaliation with FAW and BWH against BTC.com, respectively. The second and fourth columns show how each pool should retaliate against BTC.com’s FAW and BWH attacks, respectively. The third and fifth columns represent the attacker’s total payoff for each victim pool when the attacker executes FAW and BWH attacks, respectively. As shown in Table 1, by retaliating according to ARS, the four pools make the attacks of BTC.com unprofitable.
6. Identifying the opponent’s attack
To follow ARS, Pool needs to know seven parameters in Table 2: , , , , , , and . In this section, we describe how the pool can obtain these seven parameters, which make it possible for pools to adopt ARS. Among these parameters, Pool already knows , , and , which are referred to as internal variables in this paper. Also, Pool can easily obtain because the computational power of pools can be approximately calculated from the mined block information (Blockchain Info, 2018). Among the remaining parameters, , , and , Pool needs to know and because is determined by and . Moreover, the value is if is 0. If is positive, the value can be obtained from pools’ actions at stage . In other words, Pool can determine by obtaining the opponent’s action at stage . As a result, Pool only needs to know the opponent’s previous action in order to determine , , and .
To guess the opponents’ actions, Pool can plant moles in other pools. Through the moles, Pool can observe other pools’ average reward densities and stochastically determine other pools’ actions from their observed average reward densities. However, it may take a long time to find out other pools’ actions with their average reward densities. Note that, if the time duration of a stage increases, the discount factor would be decreased because pools might focus on the increase of shortterm advantages rather than longterm value. This implies that it is important to shorten the time duration of a stage. In the following section, we describe how to achieve this.
Notation  Definition 

Computational power of Pool  
Standing of Pool  
The action of Pool at time  
Computational power of the opponent  
Standing of the opponent  
The action of the opponent at time  
Comments
There are no comments yet.