One of the major shortcomings of popular permissionless blockchains such as Bitcoin and Ethereum, is that they are unsuitable for smart contracts which require non-trivial computation for execution . We call such smart contracts Computationally Intensive Contracts (CIC). CICs can potentially run intensive machine learning algorithms , zero-knowledge proofs [5, 10] etc.
One reason for this shortcoming is that transactions are executed by all miners, and this computation must be paid for by the transaction fee. Hence CIC transactions will require very high transaction fees.222Transaction verification is to some extent subsidized by mining fees. A second reason is the Verifier’s Dilemma . A miner must normally start mining a new block on one received only after verifying all its transactions. If the time taken to verify the transactions in the block is non-trivial then it delays the start of the mining process thereby reducing the chances of the miner creating the next block. Skipping the verification step will save time but at the risk of mining on an invalid block, thereby leaving a rational miner in a dilemma of whether to verify transactions or not.
Solving this open problem is important if permissionless blockchains are to scale and become global supercomputers capable of executing a large number of smart contracts requiring heavy computation. We refer to any participant in the blockchain as a node in the subsequent discussion. To achieve scalability in computation we ask ourselves the following questions.
Is it possible to design a permissionless blockchain system in which only a small subset of nodes execute a CIC but which provides the same security guarantees of Ethereum or Bitcoin?
Is it possible to design a blockchain system which decouples transaction execution from existing Proof-of-Work (PoW) consensus, thereby side-stepping the Verifier’s Dilemma?
In the literature, Byzantine nodes can deviate arbitrarily from the protocol, and rational nodes are selfish and act to maximize their utilities. Most research in permissionless blockchains has focused on threat models which either have only rational nodes or ones which have Byzantine and altruistic (honest) nodes but no rational nodes [24, 21, 11, 22, 16]. Byzantine, Altruistic, Rational (BAR)  models are more challenging to analyse and more realistic than either of these two models. Can such a system for CIC computation work under a BAR model?
In this paper we present YODA, which is to the best of our knowledge the first solution for efficient computation of CICs in permissionless blockchains which gives security guarantees in a BAR threat model. The threat model allows a fraction of Byzantine nodes in the overall system and the remaining can be quasi-honest. Quasi-honest nodes are selfish nodes which seek to maximize their utility by skipping CIC computation using information about its solutions which may already be published on the blockchain by other nodes, an attack termed a free-loading attack. They may also try to collude with each other to reduce their computation. The assumptions on quasi-honest nodes are given in detail in §III-A. It is robust to DoS attacks, Sybil attacks, and ensures timely payouts to all who execute a CIC.
YODA’s modus operandi is to make only small sets of randomly selected nodes called Execution Sets (ES) execute the CICs. ES nodes submit their solutions on the blockchain as a transaction and YODA must thus identify the correct solution from among them. Note that this problem is not the same problem of achieving consensus about blocks in the blockchain using shards [16, 14, 3, 11]. Blocks can in general take many valid values and are computationally easy to verify unlike CIC solutions which have only one correct value and are computationally expensive to verify.
Because CICs are not executed by all nodes, we say they are always computed off-chain, as opposed to other smart contracts which are said to be executed on-chain by all miners. While a small ES improves system efficiency, it can occasionally be dominated by Byzantine nodes which may form a majority and submit incorrect solutions. Hence a simple majority decision does not work even in a setting with only honest and Byzantine nodes.
Our first major contribution is the MultI-Round Adaptive
Consensus using Likelihood Estimation (MiRACLE) which addresses
this problem. It is adaptive to the fraction of Byzantine nodes in the system in the following fashion.
MiRACLE is designed so that in case most of the ES nodes
submit the same solution (which happens when the fraction of Byzantine nodes is small, i.e. ), it is correct with high probability and the
submitted solution is immediately accepted by miners. However, in case no single
solution is the clear winner, a second random ES is chosen to start a
round,333 rounds are different from block-generation epoch and are specific to CICs. A round may span multiple blocks
rounds are different from block-generation epoch and are specific to CICs. A round may span multiple blocksand so on. This latter case happens more often when is close to , that is the fraction of Byzantine nodes is large. Hence data comes in sequentially from one ES after the other.
We present a novel formulation of this problem as a multiple hypothesis testing problem where we have one hypothesis for each solution submitted and the test must decide which hypothesis is true. The winning hypothesis decides the solution. This model is not obvious because traditional hypothesis testing, for example for signals in noise, deal with real world artifacts with known probability distributions, and not intelligent adversaries who can behave arbitrarily. MiRACLE uses multiple parallel Sequential Probability Ratio Tests (SPRT) to choose the correct solution.
MiRACLE guarantees that the correct solution is chosen with probability set by a design parameter in the worst case of . Moreover, MiRACLE is optimal in that it minimizes the expected number of rounds in the special case that all Byzantine nodes submit the same incorrect solution and . Interestingly, the optimal strategy for Byzantine nodes to make MiRACLE accept an incorrect solution is to submit the same incorrect solution. Our analysis for MiRACLE, however, assumes quasi-honest nodes submit correct solutions. Since MiRACLE itself does not enforce honest behavior other mechanisms are necessary to make quasi-honest nodes submit correct solutions. Without additional mechanisms, a quasi-honest node may be tempted to free-load on solutions of earlier rounds, thus rendering our analysis for MiRACLE invalid. Although we present MiRACLE in the context of CIC execution, it can be used in a broader suite of security problems, some of which we describe in §IV-C.
Our second contribution is the Randomness Inserted Contract Execution (RICE) algorithm which deliberately adds randomness to the solution of each round so that the CIC solution changes from one round to the next thereby mitigating the free-loading attack. We prove that no matter the size of the CIC computation, RICE adds little computational overhead. To be precise, if denotes the total computation for a transaction execution without RICE, then RICE adds computation overhead of the . In the presence of free-loading attacks, we show via a game theoretic analysis that honest behavior from all quasi-honest nodes is an Nash equilibrium with .
The third contribution is an implementation of YODA in Ethereum using the geth Client interface version 1.8.7. The implementation includes MiRACLE, RICE, and most other features of YODA. Using a testbed consisting of 8 physical machines emulating 1600 nodes, we compare the performance of YODA with that of Ethereum in terms of the gas usage possible per unit time as well as the total gas usage of contracts. We also study the number of rounds for MiRACLE to converge for different design parameters. We show how MiRACLE automatically reduces the number of rounds if the fraction of Byzantine nodes is less than the worst case design scenario. We also show that RICE hardly adds any overhead to the CIC computation.
Other contributions are several mechanisms in YODA to make it robust to attacks such as chain forking attacks (CFA), DoS attacks, and Sybil attacks, and collusion attacks. These mechanisms include using Sortition along with commitments and incentives. We provide a game theoretic analysis for a collusion attack by quasi-honest nodes which increase their utility by sharing information using recent advancements in zk-Techniques . We show that the game has a Nash Equilibrium. These are added contributions which may be of interest to the reader.
Paper Organization: In Section II we present a system overview of YODA and mention requirements for off-chain execution. This is followed by the Threat Model and an overview of Challenges in Section III. In Section IV we present the MiRACLE algorithm followed by a detailed description of RICE in Section V. We then put all ingredients together to describe the YODA protocol in Section VI followed by a Security Analysis in Section VII. We then present details of our Ethereum-based implementation along with experimental results in Section VIII. Section IX describes related work. We conclude with a discussion in Section X.
Ii System Model and its CIC Requirements
In this section we give a description of the blockchain system considered in the rest of the paper and then discuss requirements for off-chain executions of CICs.
Blockchain. In YODA, a blockchain is an append-only distributed ledger consisting of data elements called blocks. A blockchain starts with a pre-defined genesis block. Every new block contains a hash pointer to the previous block. They hence together form a structure using hash pointers resembling a chain. The blockchain contains accounts with balances, smart contracts, and transactions. We refer to any entity participating in YODA as a node. Without loss of generality each node in YODA controls an account in the ledger with its a private key. The account itself is identified by the public key. A transaction in the system is a signed message broadcast by a node which can be included in a block provided it satisfies certain validity constraints. For example, transactions modifying an account balance must be signed by the corresponding key to be valid.
Ii-a Smart Contracts and its Execution
We model smart contracts as executable objects in the blockchain with the following semantics. A smart contract in YODA is denoted by its state . Here denotes its immutable globally unique cryptographic identity, represents its immutable program logic consisting of functions.The state can be modified by a transaction invoking its code and its execution can only begin at a function. In YODA smart contracts are stateful and state is maintained as pairs which together we refer to as .
A transaction consists of tuple . is a globally unique transaction identity and is the code it invokes. All external inputs required for the functions are part of and consists of meta-information about the account that generated the transactions along with a cryptographic proof of its authenticity. Hereafter we assume all transactions are validated using before being included in a block and hence we drop . In YODA, transactions can be used to transfer tokens, create contracts or execute functions from smart contracts.
Executions of functions in YODA are modeled as transaction driven state transitions. We use to denote a Deterministic State Transition Machine (Possibly Turing complete). Formally, executes ,
is the state of the contract after executing .444We use for function executions, with the function and its inputs on its right and the returned value on its left.
Ii-B Intensive Transactions.
Intensive Transactions (IT) are transactions which cannot be executed on-chain due to either of two problems: its execution time exceeding the typical interspacing between blocks, or competing with PoW time (the Verifier’s Dilemma). The first problem can occur in permissioned ledgers such as Hyperledger, Quorum, R3-Corda etc., and the both problems in permissionless blockchains such as Ethereum and Bitcoin. The exact definition of an IT will depend on parameters of the blockchain system under consideration. Transactions which are not ITs are called non-ITs.
We give one example of a CIC for Ethereum using the concept of , a measure of cost of program execution . Ethereum associated a fixed cost with each machine level instruction that a smart contract executes and enforces the constraint that all transactions included in a block can consume a maximum combined gas of blockGasLimit which is set to prevent the Verifier’s Dilemma . Every time a transaction is broadcast, its creator specifies , an upper bound on the gas it is expected to consume. Clearly for any transaction to be included in a block. Transactions which violate this condition are thus ITs.
We enable ITs in YODA by executing them off-chain in parallel to the mechanism of blockchain consensus, thereby sidestepping the problem of verifier’s dilemma. For completeness sake, we design YODA to allow non-IT transactions to also be executed off-chain as long they satisfy the gas requirements stated in §VI-B
Ii-C Computationally Intensive Contracts
We term all smart contracts that execute ITs as Computationally Intensive Contracts. YODA selects a subset of nodes known as the Execution Set (ES) to execute . An ES is chosen from a Sybil resistant larger set called a Stake Pool (SP). Nodes join SP by depositing a fixed amount of tokens as their stake. YODA forfeits the deposit of misbehaving nodes using techniques developed in §(CIC enabling). Note that SP could potentially include all nodes the entire network, especially in a small blockchain network maintained by several hundred nodes such as applications that are built using Hyperledger . We discuss details of SP in §VI-A.
Since YODA allows transactions of different CICs as well as on-chain transactions to run in parallel we make some assumptions as well as provide special mechanisms to prevent race conditions from occurring. First, we assume that off-chain transactions of CIC cannot modify the storage of any other smart contract whether CIC or non-CIC. Second, we assume that on-chain transactions cannot modify the state of any CIC if any off-chain transaction execution of the CIC is in progress. Third, we ensure that all transactions of all CICs are ordered on-chain before their execution by a special on-chain smart contract called the Master Contract555Systems can be built where all rules in MC are part of the basic System protocol instead of making it a smart contract. Our implementation makes MC a smart contract. (MC). It maintains a queue for each CIC . The transaction at the queue’s head is executed first. In addition, MC embodies the rules of YODA like creating CICs, ordering their transactions and initiating off-chain execution process, running MiRACLE, distributing rewards to the ES node, enabling YODA nodes to join SP by collecting their deposits etc. A non-IT transaction can be executed on-chain and an IT must be executed off-chain. This allows parallel execution of ITs of different CICs off-chain.
For the rest of the paper, unless otherwise stated, if some event has negligible probability, it means it happens with probability at most for some security parameter . Any event whose complement occurs with negligible probability is said to occur with high probability or .
Ii-D Requirements of off-chain executions
We divide the requirements for off-chain computation into two categories: Correctness and Performance. For a contract with state and for a IT an off-chain execution of is said to be correct iff the state of the contract after the off-chain execution produces a state identical to the state of on-chain execution of by removing all constraints mentioned in §II-B. We conjecture that following requirements are necessity and sufficient conditions for an off-chain execution to be correct.
Termination. Due to the gasLimit associated with an on-chain transactions it is guaranteed to terminate. Likewise, the off-chain CIC execution mechanism of YODA must also terminate within a delay given that the gasLimit of the IT is finite.
Validity. YODA must Terminate the off-chain execution by producing the correct execution result i.e
Agreement. All non-Byzantine nodes in YODA (including miners), agree about within a bounded delay after the off-chain execution of CIC terminates.
Availability. The post execution state should be available to all nodes within a bounded delay from the termination point
Performance wise we seek YODA to satisfy the following requirements to scale and run many CICs in parallel.
Oblivious Execution. ITs are never executed or verified on-chain either fully or partially. Miners are thus oblivious of the ITs’ execution.
Efficient. Only a small subset of nodes in YODA should compute
Adaptive. YODA must be adaptive to the true adversarial control in the system, that is it must require fewer nodes on average to execute an IT if the adversarial control is lower.
weak-Fairness. All ES nodes correctly executing get the same positive token reward.
Iii Threat model, Assumptions and Challenges
YODA assumes that the underlying blockchain provides guarantees about its Safety and Availability. Safety means that all smart contract codes are executed correctly on-chain, and availability means that all transactions sent to the blockchains get included within bounded delay.
Iii-a Threat Model and Assumptions.
We consider two kinds of nodes in SP: Byzantine and quasi-Honest. Byzantine nodes are controlled by an adversary and these nodes can deviate arbitrarily from the protocol provided by YODA. The adversary can make all Byzantine nodes collude with perfect clock synchrony. They can add or drop messages arbitrarily and not execute CICs correctly. We assume that at most fraction of nodes in SP are Byzantine, and these can be arbitrarily selected by the adversary at the start of each round. However, during the progress of a single round (Section §IV) the adversary cannot compromise more nodes. Additionally the adversary has all intermediate contract state information from all previous rounds and can successfully communicate these state information (potentially false) about previous rounds to any node in the system before the start of a round. However, we assume cryptographic primitives are computationally secure.
Systems like permissionless blockchains cannot be assumed to have all honest nodes. They rely heavily on incentives and the rationality of nodes in order to work correctly. However building systems only secure against rational nodes which seek to maximize their utility is again impractical as some nodes could be Byzantine and not care about their returns. Practical systems will have a combination of rational and byzantine nodes. Modeling rational nodes in these systems, taking into account all possible means of profits, costs, and attacks is non-trivial and is beyond the scope of the paper. However to bring the our model close to reality we work with quasi-honest nodes which deviate from the protocol in the manner mentioned below.
Quasi-Honest. Quasi-honest nodes will skip execution of an IT either completely or partially, for example by not executing some of its instructions, if and only if the expected reward in doing so is more than that for executing the transaction faithfully. They do not share information with any other node which may forfeit their stake deposits. They are conservative when estimating the potential impact of Byzantine adversaries in the system i.e a quasi-honest node while computing its utility assumes that the byzantine adversary acts towards minimizing their rewards .
Quasi-honest may skip computation using one of two methods. The first is “free-loading” where they attempt of identify the correct CIC state by (i) the state information of that same transaction already published on the blockchain by other ES nodes, or (ii) state information that can be derived or verified by re-executing the RICE algorithm using the same information in (i).
The second is by colluding with other ES nodes of the same round to submit an identical CIC solution without evaluating the IT. A quasi-honest node only colludes with nodes whose membership in the ES it can verify. YODA has checks (ref. §VI-D) which prevent nodes from directly proving their ES membership. Hence nodes must use Zero-Knowledge-Proof techniques like zk-SNARK  to establish their membership in ES. YODA allows usage of smart-contracts as shown in  to establish rules of collusion. However we assume that a quasi-Honest nodes does not know for sure if the node it is colluding with is quasi-honest or Byzantine. Additionally, both free-loading and collusion have costs associated with them which are due to processing of intermediate storages, producing and verifying zk-Proofs, bandwidth and computation power. In case neither “free-loading” nor “collusion” gives a better expected reward than executing CICs correctly, a quasi-honest will execute the IT correctly.
Other Assumptions. We assume the network is synchronous, that is transactions broadcast by nodes gets delivered within a known bounded delay. However unlike  we do not assume the existence of an overlay network among SP nodes. Also, we do not assume the presence of a secure broadcast channel or a PKI system. We abstract the source of randomness required for RICE to a function RandomGen() (we describe formally in §VI) which can be accessed by all nodes in YODA. This can be built as a part of YODA or as an external source using techniques from [25, 16, 11].
Apart from recently studied challenges like preventing Sybils in SP [19, 1] and generating an unbiased source of randomness in the distributed setting to select ES randomly [11, 16, 25], our system must tackle the following. The first challenge is to prevent quasi-honest nodes from Free-loading and collusion. The second challenge is that since ES size is small, they become vulnerable to Lower cost DoS Attacks than a DoS attack on the entire blockchain network.
Further, existing on-chain verification of off-chain computation methods fall into two categories: Symmetric and Asymmetric. In symmetric methods, a contract is re-executed on-chain by miners and is hence its computation is limited due to the Verifiers Dilemma. In the asymmetric model, the on-chain verification uses a method other than recomputation, either by generating a proof-of-execution, possibly interactive , or via non-interactive methods using zk-SNARK . These techniques are associated with non-deterministic overheads (depending upon the CIC) (eg. Truebit) or are too computationally expensive for general purpose hardware. Designing a verification scheme that adds a small constant computational overhead of on-chain verification, independent of the CIC is new. We are the first to take this approach. Also, since CICs are never executed on-chain, producing both efficient off-chain execution and providing guarantees of correctness is challenging.
Iv MiRACLE: Multi-Round Adaptive Consensus using Likelihood Estimation
In this section we describe MiRACLE, an algorithm to determine the correct CIC state in the presence of Byzantine adversaries. The goal of MiRACLE is to be efficient, by making only a few nodes compute the CIC, and at the same time automatically adapt to the fraction of Byzantine nodes in the SP by terminating faster and making fewer nodes perform the off-chain computation, the smaller is. We prove that MiRACLE is optimal in the expected number of rounds if the Byzantine fraction of nodes in SP equals .
Iv-a Problem and Simplistic algorithms
In the event an IT is published in a block which invokes a function from , we wish to achieve consensus on or equivalently on by making only one or more small randomly chosen Execution Sets (ES) of nodes execute the CIC instead of all nodes. The node in ES executes where may differ from in case the node does not honestly execute the IT. Nodes then reveal by publishing them on the blockchain. For the ease of exposition, we consider the digest to be the Cryptographic hash of . While describing RICE in §V, we add more information to the digest besides the hash to address certain attacks.
Since quasi-honest nodes are forced to be honest by mechanisms in RICE (ref. §V), we treat the quasi-honest nodes as honest in this section.
We require MiRACLE to reach consensus on incorrectly with probability less than a user-specified , given and while minimizing the expected number of rounds to terminate.
To motivate MiRACLE we first describe two simplistic algorithms for off-chain execution of CICs. In all algorithms below, each node in SP is selected to belong to an ES with probability independent of other nodes. Note that .
Naive Solution 1 (NS1): Suppose we use a single subset ES from SP to compute the CIC. If more than 50% of nodes in the ES publish the same solution then this is chosen as the new state of the CIC. One shortcoming of this scheme is that for small , the size of ES must be a large fraction of SP. A second shortcoming is that if the actual fraction of Byzantine nodes is much smaller than then we end up using an ES much larger than required. For example, with as the error probability of accepting incorrect solution, starting with an and , the required .
Naive Solution 2 (NS2): In this solution we relax the requirement of achieving consensus in one round. In every round a “no consensus” decision is allowed, which triggers a subsequent round. This continues until some round reaches consensus.
Choose to be any small value. Then with a small probability . In such cases check if the number of nodes publishing the same solution exceeds . If so, declare this the correct solution and terminate. The advantage is that we can use an ES in each round of size smaller than the ES used in NS1 for a large range of values. Note that in NS2, in certain rare instances, a single round may still be sufficient to reach consensus.
One shortcoming is that the number of rounds to terminate can be large because NS2 does not optimally combine the results of all rounds in order to reach consensus. Results of one round are forgotten in future rounds.
In MiRACLE, we employ the multi-round strategy of NS2 to achieve gains in case . In contrast to NS2, each round uses all published results hitherto to decide whether to terminate or not.
Iv-B Design and Algorithm
For a given , let be the unique values submitted up to and including the round. Let denote the number of times is repeated in the round, and
denote the corresponding random variable. Letdenote the total number of submissions (ES nodes) in the round, i.e
Primer on Hyptothesis Testing The problem hence is one of deciding among one of may solutions submitted on the blockchain. We present a novel model of this problem as a multiple hypothesis testing problem where we have one hypothesis for each solution submitted and the test must decide which hypothesis is true.
To understand how a hypothesis test works consider an example of a communication system in which one person is transmitting one image selected from a known master set to a receiver over a noisy channel. The received image, we call the observation, is corrupted by noise. The task is to decide which image was transmitted given the observation. To solve the problem, one proposes a hypothesis for each potential image which claims that the corresponding image was transmitted. The goal is to determine which hypothesis is true. To do so, the receiver computes the probability of the observation conditioned on every hypothesis being true, which is called the likelihood of that hypothesis. Only if one likelihood is much larger than the others can one say with confidence that the corresponding hypothesis is true with high probability.
Our novel contribution in MiRACLE is to formulate the problem of determining the correct CIC solution as a hypothesis testing problem. This is not an obvious formulation because traditional hypothesis tests are designed to handle real-world phenomena such as signals in noise. In our problem we have an intelligent adversary which is hard to model as there is no restriction on what solution it can submit. Hence unlike the image problem described above, there is no master set of potential correct solutions.
However, due to our assumption of Byzantine nodes having maximum fraction in SP, in this worst case we do have a probability distribution on the total number of Byzantine nodes in an ES. Similarly we have a probability distribution of the total number of quasi-honest nodes in an ES. These probability distributions are sufficient for us to compute a likelihood and perform a hypothesis test, in the case the adversary submits only a single incorrect solution. In this case MiRACLE is optimal in the number of rounds it takes to converge. However, if he submits many solutions then the assumed distributions for different hypotheses are not perfect. Fortunately, if the adversary submits more than one solution, it is to his own detriment as MiRACLE will converge to the correct solution faster than if he submitted a single solution.
MiRACLE uses multiple parallel Sequential Probability Ratio Tests (SPRT) to choose the correct solution  whose details are given next.
MiRACLE as Parallel SPRT:
We model the problem as simultaneous two-hypotheses Sequential Probability Ratio Tests (SPRT) . The th SPRT is given by:
Null Hypothesis: is the solution
Alternative Hypothesis: is not the solution The log-likelihood is defined as the log of the ratio of probabilities of the observations () conditioned on the two hypotheses. We denote the log-likelihood after rounds as and proceed as follows. For appropriately chosen threshold , in round we perform
When any one SPRT, say the , terminates in favour of its Null Hypothesis , we halt all other SPRTs and declare as the digest. If no SPRT decides, we proceed to the next round. We prove several results pertaining to this parallel SPRT in §VII-A.
Iv-C Other Applications of MiRACLE
MiRACLE can be used to solve various securiy problems other than blockchain CIC execution. We discuss some of them here.
Authenticated decentralized data feeding. Towncrier  is an existing system for feeding external data into smart contracts. Its security guarantees rely on correct functioning of Intel SGX  which essentially boils down to trusting Intel hardware. Using MiRACLE we can create an efficient, authenticated data feeding system without relying on trusted hardware. The setup for this application is identical to the setup considered while describing MiRACLE where SP is replaced by a swarm of data feeders among which less than are compromised.
Autonomous Internet-of-Things (IoT) Swarm Configuration. Consider a autonomous IoT swarm, where each device in the swarm was initialized with an identical program, but during runtime a set of nodes less than are corrupted by a Byzantine adversary. In this system, in a synchronous network setting, a new device joining the swarm can use MiRACLE to efficiently identify and download the correct program from its peers.
V RICE: Randomness Inserted Contract Execution
V-a Motivation and Objective
MiRACLE by itself does not force quasi-honest nodes to behave honestly. In fact, a free-loading attack by quasi-honest nodes is a real possibility. Here quasi-honest nodes in an ES of one round may simply replay the digest with highest log-likelihood of previous submissions in earlier rounds, thereby guessing the correct digest with large probability and also saving on heavy CIC computation.
In this section we describe Randomness Inserted Contract Execution (RICE), a procedure to pseudo-randomly change the digest from one round to the next to mitigate the free-loading problem. We prove that RICE adds little overhead to the overall CIC computation. We modify the digests so that despite digests changing from one round to the next, the MC is able to map digests from different rounds to the same CIC state they represent. Other attacks, such as collusion of quasi-honest nodes within the same ES and copying digests submitted by nodes in the same round are addressed in §VII-C-VII-D.
V-B Design of RICE
Setup. Call (or simply ) a unique digest of . For example, can represent the root of a Merkle tree where leaves of the tree corresponds to the pairs of the storage. When a CIC is created on-chain, a publicly available random is deliberately added to the state of the contract. Hence now becomes . This remains in the state for the entire life of the CIC and is updated for each transaction using the following procedure.
For each round in MiRACLE nodes in ES start execution of CIC with the seed as given below where represents the starting of round.
and if MiRACLE terminates in round with final state then
Array Model for RICE Execution. Consider an execution model in which all machine level instructions that executes are stored in an imaginary “instruction array”, that is the instruction executed is stored in the array element. RICE then interrupts execution666Blockchains such as Ethereum count gas used after each instruction. Hence additional interrupts are not required for Ethereum-like blockchains. of at certain intermediate indices of the array and updates the with Hash. By choosing these different indices pseudorandomly in different rounds, RICE produces a different every round. The digest we submit is the tuple obtained after executing . This ensures that the digests from different rounds can be correlated using their values. Due to the deterministic nature of the CIC, all nodes computing correctly will have the same across rounds. Their values will be identical within any round, but will differ from one round to the next. Malicious nodes may submit the correct but the wrong , an attack we guard against in §VI-F
Details Let denote the indices in the instruction array where . Note that is unknown a priori, but due to the gas limit included in , it is guaranteed to be bounded. Thus to update , instead of executing the entire array in a single run, RICE progressively executes a subarray of array between two index (initial) and (final), updates the digest, and repeats the process with the next sub-array and so on until it reaches .
Formally, let denote an arbitrary subarray from with and its initial and final index. RICE consists of a new deterministic contract execution function with the following semantics. Inputs to are two indices , an intermediate CIC state and . Given input , executes subarray (both inclusive) with storage and transaction . Post execution returns a potentially modified contract state and the last successfully executed index. In the special case where for some , runs only till and returns as its output. Formally,
After execution of , RICE computes and combined with the final content of we define in RICE as the tuple . Algorithm 2 describes the RICE algorithm.
Choosing the indices. A naive strategy is to choose indices as multiples of a fixed number, say . Note that cannot be a function of which is not known prior to computing . This strategy leads to overheads of . There is another problem, namely that the indices do not change from one round to the next. As a result a node can free-load by asking ES nodes from previous rounds to reveal the values at these indices to which an adversary might respond. Making things worse, can provide proof that revealed roots are generated using the correct procedure of repeatedly hashing the at different indices. Note that this procedure does not prove that the root values are correct and indeed correspond to the actual root values at the indices. Call the root values of after the instruction corresponding to , all of which may be incorrect. Then repeating starting with the initial known seed in its round and iterating over all it shows that corresponds to these root values. An ES node seeing this proof may be tempted to believe they are correct if the is the digest with highest log-likelihood §IV. It may hence not execute and instead simply reuse to compute its digest for current round. In case was not the digest corresponding to the correct execution of CIC, this may lead to an attack where an Byzantine adversary introduces a false digest by offering values to ES nodes from an earlier round.
A second naive strategy is to choose the sub-array sizes randomly but with mean size exponentially increasing as progresses. For example, choose randomly from where increments by 1 from one sub-array to the next. On the positive side, this will lead to seed updates (and consequently overheads of that order) and also will produce a different set of indices from one round to the next with large probability. However, there remains the problem of skipping a large fraction of towards the end. At any point near the end of , by looking at gas limit of ES nodes might be able to identify that the current update is the last seed update and the next update is beyond . The of a round requires but after the last update remains fixed. By design the already published on the chain with highest log-likelihood will be correct with large probability, and therefore nodes may skip computing beyond the last update by simply using with highest log-likelihood to generate their digests. For this strategy the last update can be at most prior to thereby leading to nodes skipping as much as half of the computation. Hence although overheads have reduced to , the computation skipped at the end is . We seek to find a sweet spot between the two with our choice of indices for RICE.
RICE uses a hybrid of the two index locating procedures described above. The idea is to divide the array into segments of size where . In other words, every value of repeats times. Consequently the segment sizes increase sub-exponentially in size. We choose one updating index in each segment and update with Hash where is the intermediate state of after executing all instructions up to . Thus, like the second naive scheme the sub-array size increases but much more gradually so that the last sub-array which might be skipped is smaller. More precisely, for a segment of size we choose the index to update the as INT away from the begining of the segment, where INT denotes the integer whose binary representation is identical to the first bits of .
Vi Enabling CICs in Blockchain
In this section we give a broad system-level overview of the YODA protocol. We have described two of its key ingredients in detail: MiRACLE, which enables efficient CIC computation with small sets of nodes, and RICE which makes guessing the seed of one round difficult from submitted digests in earlier rounds. The other mechanisms we describe here address the other challenges mentioned in §III-B, such as preventing sybil attacks, collusion, DDoS, and certain variants of free-loading attacks.
The following functions are used in YODA.
CheckSort This function on invocation internally runs Secret Cryptographic Sortition (SCS) . The is used to set the probability that an SP node is in ES. If CheckSort() returns it implies the node was not selected. Otherwise the node is selected and is indistinguishable from a truly random number to anyone without . However, it is easy to prove that , given and .
RandomGen() on invocation produces a unbiased distributed random string. It can be practically built using the Randhound protocol given in , alternatively use NIST randomness beacon potentially relayed through data feeding mechanism like Towncrier , or protocols based on MiRACLE as described in §IV-C
Stake Pool. YODA prevents sybil entries in SP . To join SP, a node needs to deposit stake . This also works as insurance for misbehavior of SP nodes. SP once selected remains valid for a system defined interval of time denoted by beyond which YODA re-initiates the SP selection procedure. During re-balancing rewards to SP nodes that tries to withdraw from previous SP are only given upon termination of all CICs from the previous epoch. Nodes willing to continue to the next SP needs to only send a transaction showing their willingness without making any further deposits.
CIC creation and deployment. To deploy a CIC with state on the blockchain, a node broadcasts a transaction requesting creation of CIC containing the tuple . Miners use RandomGen() to generate a and a unique identity for the CIC. Then we depoy on the blockchain like any other smart-contract.
With these preliminaries, we describe the 5 steps for off-chain execution of CIC.
Vi-B S1. CIC Transaction Deployment
On receiving an IT, , miners generate string using RandomGen() which is used in CheckSort() to elect an ES. It is important that the be created during or only after inclusion of on-chain. Otherwise, if the is known a priori, the node generating can perform the following attack to dominate the ES formed. It can enroll with key-pairs in SP such that the Sortition Check §VI-C results for the key-pairs will guarantee it a large membership in ES. It then broadcasts the transaction, dominates the resulting ES, and submits false solutions which may be accepted.
Gas requirements. The creator of deposits in the MC where is a minimum amount to pay for the fixed costs of S4 and S5, and and denote the gas price and gas limit respectively specified by . Extra stake after execution of is refunded.
Vi-C S2. Sortition Check and RICE of CICs
Since any ES is small, we face challenges like Increased Malicious Fraction, and Low cost DoS attacks described in §III-B. Increased Malicious Fraction in ES is partially mitigated using Sortition which selects nodes at Random. This prevents Byzantine nodes from joining the ES at will. It is further addressed by MiRACLE which can tolerate occasional large fractions of malicious nodes in ES §IV.
Bu using SCS to select ES, we protect them from DoS attacks since their selection to the ES is secret until they reveal the fact. Since YODA uses commit-reveal mechanism (see below) ES nodes are not vulnerable to DoS attacks until after they submits their commit transaction. After the commit step, ES nodes may be easier to identify and hence the DoS attack can be more effective. However, the second step involves an ES node broadcasting a single on-chain transaction after a certain number of blocks have been generated. We assume that a node is sufficiently DoS resilient to be able to receive block headers in a timely manner and also to broadcast a small transaction.
All nodes selected in ES then execute the corresponding CIC in RICE as given in §V and generates the corresponding RICE digest and proceeds to S3.
Vi-D S3. Commitment and Release
Let , and denote the digest of , and the result of CheckSort() respectively for node . The commitment generates is and is given by
Assuming existence of a VRF and Ideal Hashing, for and hence even if . Nodes in ES then broadcast to the blockchain as a transaction which miners include on-chain if the node is in SP.
ES nodes must broadcast their commitment within a time window or commitment period starting from the block that includes . Window is transaction dependent, recorded in MC and measured in number of blocks. It is set based on the gas limit mentioned in as by design will run for at most instructions.
A node is required to keep secret during this period and forfeits its deposit if it fails to do so. This deters ES nodes from colluding. However, as mentioned in §III-A, ES nodes can use of ZK-proofs to prove that it is in an ES without revealing . We perform a game theoretic analysis of such an attack in §VII-D.
After , nodes in ES wait for a buffer period before sending their unhashed digest and sortition result to the blockchain. Nodes in ES which have submitted commitments earlier, are required to submit a transaction containing within a time window or release period and failure to do so results in their forfeiting deposits and being removed from SP.
The reason for keeping this buffer period of length is to prevent an adversary from launching a DoS attack which we term the Chain Forking Attack (CFA). CFA can occur in blockchains where block creation does not guarantee block finalization and nodes need to wait for certain number of blocks before becoming certain about a block’s finality . Assume the absence of this buffer period. If this buffer did not exist then if an honest node publishes its opened commitment after and expects its inclusion in a future block, an adversary can create an alternate chain where it includes this transaction before the end of and can thereby penalizes the honest node. To prevent this from happening, the introduction of between and ensures that the attacker will have to create a fork which is long enough to be prohibitively expensive to create.
Vi-E S4. MiRACLE for CICs
The blockchain miners then execute one round of MiRACLE using the submitted digests. All digests with the same are considered by MiRACLE to be the same solution irrespective of which they contain. Steps S2-S4 are repeated if necessary till MiRACLE converges.
Vi-F S5. Reward Distribution and Cleanup
After MiRACLE, any node in ES broadcasts one or more transaction to the blockchain containing new state updating the state to corresponding to the winning digest.
To disincentivize nodes from behaving arbitrarily, YODA rewards ES nodes as follows. Let denote the round in which MiRACLE terminates with . The deposits of all ES nodes who submitted a digest with different root from the winning one are forfeited. For a round , let be the different values submitted in digests containing and let be their count. YODA then rewards only the ES nodes corresponding to the for which where . YODA confiscates the deposit of all ES nodes for which and YODA neither rewards nor punishes the rest. These forfeited deposits are either burned or transfered to the MC.
The intuition behind using thresholds are as follows. Although MiRACLE identifies the correct root w.h.p., it cannot say which of two digests, both containing the correct root in a particular round, is correct. Rewarding both would encourage free-loading. A naive solution would be to reward the set of nodes corresponding to and punish the rest. There are rare instances in which Byzantine nodes can exceed quasi-honest nodes in a round. If Byzantine nodes publish the correct but with an incorrect seed, the naive method would severely punish the honest nodes. The set of quasi-honest nodes will however not be a very small fraction of an ES. Hence threshold is chosen small enough to ensure that quasi-honest nodes which behave honestly will not be punished , while punishing lone quasi-nodes who try to guess the correct seed. Quasi-nodes have to collude in large numbers to cross the threshold, an attack which is non-trivial and analyzed in §VII-D
Lastly, blockchain miners perform cleanup phase, where it deallocates space used in execution of which includes the storage commitments, sortition results etc. Following this miners check whether transaction queue is empty or not. On a non-empty case, notifies the SP nodes to initiate the protocol for off-chain execution and the cycle continues.
Vii Security Analysis
In this section we analyze the security properties of YODA. We first analyze MiRACLE and prove many results, the most important being that it is optimal in the expected number of rounds under certain constraints. Incidentally, given MiRACLE, Byzantine nodes maximize the probability of choosing an incorrect solution by all submitting the same incorrect solution. Ironically, MiRACLE is optimal given this particular strategy of Byzantine nodes.
We then analyze RICE, proving bounds on the number of update indices, and the amount of computation that can be skipped at the end. We also prove that w.h.p. every round will have update indices which have not been encountered in previous rounds. This makes free-loading difficult.
We then present a Game-theoretic analysis of our incentive schemes proving them to have Nash Equilibria . We finally stitch togther all our results to show how they meet the requirements mentioned in §II-D. Lastly we discuss why guarantees in YODA are likely to work in an even more realistic setting with a stronger adversary and where quasi-honest nodes are allowed more protocol deviations.
Vii-a MiRACLE Analysis
In this section we present the security analysis and guarantees provided by MiRACLE. Let be the total size of SP i.e containing fraction of Byzantine nodes. Let the probability of any node in SP getting chosen for an ES be . Let denote the total number of Byzantine and quasi-honest (here assumed to be honest) nodes in ES. Let denote Bernoulli random variable indicating if the Byzantine or honest node is selected for the ES or not. Then
We approximate by a Gaussian distribution, since they are a sum of large number of random variables. Let denote the mean of respectively and
denote their variances. These are
Consider the case where all Byzantine nodes consistently provide the same solution. Let the solutions be and . The problem of determining the correct solution boils down to choosing between two hypotheses over multiple rounds: is the correct solution; . Let denote the number of solutions equal to in round . Then the optimal solution is given by an SPRT in which the log-likelihood ratio after rounds is
If , then the SPRT chooses . This is equivalent to MiRACLE. ∎
Remark 1: MiRACLE is optimal in case . In case , the expected number of rounds will be less than that specified in (3), while still ensuring that the probability of incorrectly deciding is less than . In this sense MiRACLE is adaptive to as shown in Figure 1.
Remark 2: The probability of choosing a node to belong to the ES, , can be set to any value which fixes the expected size of ES in any round. We recommend that be chosen such that if all nodes in the first ES are honest then the log-likelihood crosses the threshold in that round itself. This ensures that multiple rounds can occur only if there are some Byzantine nodes in SP. Solid lines from graph in Figure 2 minimum required with =1600 and for one round consensus.
Although we have proved that MiRACLE is optimal if an adversary chooses a single solution, the question arises as to whether the adversary has a better strategy for MiRACLE in which it chooses more than one solution. The next theorem states that this is not the case. Indeed, the best strategy for the adversary, given that YODA uses MiRACLE is to choose only a single incorrect solution.
With MiRACLE as the consensus algorithm, the best strategy for an adversary controlling all byzantine nodes is to only submit a single incorrect solution. Any other strategy reduces the probability of choosing incorrect solution by the system.
Consider a strategy in which adversary submits solution for and the solution submitted by honest nodes is . Let us call this strategy ST1. Consider another strategy ST2, in which the adversary only submits a single solution denoted by and the honest nodes submits . For an round in ST2, let the corresponding likelihood be and the number of solutions submitted is . We assume that total number of submissions for an round are the same in both ST1 and ST2. Hence, it is trivial to see that . Also,
Hence the result follows. ∎
For each solution submitted MiRACLE has one log-likelihood which is compared with . The question arises as to whether or not more than one log-likelihood can simultaneously exceed the threshold, thus leading to multiple solutions for the transaction. The following theorem proves that this cannot happen.
MiRACLE can terminate with only a single solution being adjudged correct.
Vii-B RICE Analysis
In this section we prove that RICE adds low overhead and is secure.
Given RICE terminates in an subarray of size , let be the number of times is interrupted to update the in RICE, then
Proof Sketch: Due to the slow increase strategy, the total number of times storage root is updated i.e is
which proves the lemma.
Given the bounds on number of times is updated in RICE, we now proceed to find a relationship between the and . Thus we first find relationship between and and then proceed further.
The relationship between and is
Proof Sketch: From the slow increase strategy we have
Simplifying further we have the result.
(RICE Efficiency) The number of times is updated in a single RICE run i.e is .
We now identify the possible point where last update happens in RICE for
With as the length of array representing , the last update happens at fraction prior to .
Let be the index of last seed update and let lies inside a segment of length then . Hence we want is
As discussed earlier, it is important for different rounds to use a different set of indices for updating the seed to prevent free-loading attacks. We thus prove how RICE prevents free-loading attacks in YODA.
(Unmatched index) Let RICE and RICE, with be two RICE runs in distinct rounds in YODA. Then and denote the set of indices where is updated in rounds and . An index is said to be an Unmatched index with respect to RICE iff .
Strong Unmatched Index. An index in RICE is called Strong Unmatched Index if it is an unmatched index RICE where .
We now evaluate the distribution of number of strong unmatched index in RICE. The presence of even a single strong unmatched index implies that even if an adversary assists a quasi-honest node in a free-loading attack, by revealing root values corresponding to indices in earlier rounds, these prove insufficient to compute the digest of RICE.
The occurrence of a strong unmatched index in a segment of size in RICE is a Bernoulli random variable with mean lower bounded by . This is a tight bound and the event corresponding to the lower bound occurs when all previous rounds have strong unmatched indices in this segment. There are such segments of length . In the case where all RICE have strong unmatched indices in all segments, is random variable with Poisson binomial distribution  with mean and variance given in the statement of the theorem. ∎
Given the above we proceed to find a lower bound on the probability that the number of strong unmatched index is greater than some . We achieve this by finding a lower bound on in the round with the tail of a Binomial Random variable. As CIC size we prove that for a series , the tail of the binomial and hence goes to 1. This means that the number of strong indices increases without bound w.h.p. as increases. This in turn reduces the chances of success of a free-loading attack as it becomes virtually impossible for a node to guess the values at an increasingly large set of strongly unmatched indices.
Let in the round of MiRACLE ends in a segment of size . Given defined as in Theorem VII.5, we can lower bound the tail probability of i.e for any with the tail probability of for any where is a binomial distribution with trials with as success probability of each trial.
Proof Sketch: Call the occurrence of a strong unmatched index in a segment of size in RICE as a trial in that segment. The trial is a Bernoulli random variable with mean lower bounded by . If then the mean has lower bound and if the mean is lower bounded by 0. Hence which is the sum of all trials has tail distribution strictly higher than the tail of the sum of i.i.d. Bernoulli random variables with mean . There are number of segments with . The result follows.
As , .
Proof Sketch: Choose . The result follows from the above Theorem and the use of the well-known bound on the tail distribution of given by
Remark 1: Recall that MiRACLE allows the system designer to choose an appropriate size to achieve an expected number of rounds. In this way, the number of rounds can be limited to less than a constant .
Remark 2: Since grows unboundedly with , it follows that for large sized ITs, and some finite round , the number of strong indices grows unboundedly w.h.p. Since the values at these strong indices are not known w.h.p. (except for trivial CICs where storage does not change over indices) the final also cannot be known w.h.p.
The next result shows that the probability of occurrence of any particular seed value is vanishingly small assuming the roots at different strong indices are mutually independent.
Let the probability mass distribution of the at all strongly unmatched indices in round be upper bounded by for some . Let the last segment of the CIC be of size . Then as the probability mass function of the at the end of RICE is negligible assuming an ideal hash function, and that the root at different unmatched indices are mutually independent.
Proof Sketch: Let denote the strongly unmatched indices, and and the corresponding root and seed. Since the hash function is ideal, it maps unique inputs to unique outputs.
Thus . The last equality is due to the independence assumption. We assume that , the seed at the first strong unmatched index, is known to the node and hence .
Denoting as the final seed, we have . Since is larger than w.h.p. as we have .
Remark: The roots of indices “far apart” being independent is not unrealistic, except for trivial CICs. Strong unmatched indices are in different segments and hence except for neighboring indices, they are separated by whole segments, and hence we conjecture that the independence assumption is a good approximation in practice. We also conjecture that the same result holds for weaker assumptions than stated in the theorem and leave the proof for future work.
Vii-C Free-loading attack
We now analyze a free-loading attack where a quasi-honest node skips computation of the CIC by using information available on the blockchain and/or state information of from previous rounds received from an adversary §V. We consider the best case scenario for the free-loading node where it knows the correct of the w.h.p. but has to guess the . We analyze the case where Byzantine nodes have maximum fraction in SP and all submit the same incorrect with the same in order to maximize the probability of MiRACLE selecting their solution, and where quasi-honest nodes do not collude.
Denote the profile where all quasi-honest nodes execute the CIC as and the profile where only a single quasi-honest node free-loads as . With the analysis of MiRACLE with honest and Byzantine nodes holds. Hence quasi-honest nodes win reward with probability , and lose their deposits with probability . The cost of computing the CIC is . Hence the utility for with this profile is
Let be the probability of guessing the correct seed while free-loading. If it guesses the correct seed then its probability of winning a reward and losing its deposit is and as above. If it guesses the wrong seed then it loses its deposit. We denote the cost of bandwidth consumed for downloading intermediate of previous rounds from an adversary and analyzing them to predict the by . Then the utility
where the last approximation is due to the fact that is vanishingly small in practice and is a design parameter chosen to be small. Since , that is the reward must be more than the cost of computation, we see that (12) is true. Hence profile is a Nash equilibrium .
Vii-D Collusion Attack
We now consider the case where a group of ES nodes collude to submit a common seed. We assume they know the correct root w.h.p, that Byzantine nodes all submit the same incorrect with the same in order to maximize the probability of MiRACLE selecting their solution, and that all other quasi-honest nodes execute the CIC correctly. Suppose with probability and with probability . The computation cost of colluding requires solution of ZK-proofs since nodes need to prove they belong to ES without revealing their Sortition results, and also requires creation of appropriate smart contracts to punish nodes which deviate from the collusion pact. Denote the associated costs by and this profile by .
In case the Byzantine nodes win MiRACLE, lose their deposits. In case the correct root is selected by MiRACLE, win a reward with probability , and lose their deposits with probability . Hence utility for node
In case , is a -Nash equilibrium  with . In this special case, if the is larger than the CIC computation cost itself, the nodes are better off being honest. Note that higher increases