The introduction of decentralized blockchains, initially conceived as a means for cash payments without a trusted intermediary in the form of Bitcoin [bitcoin], has sparked a flurry of interest in developing decentralized applications for a wide variety of areas. Smart contracts [szabo1997], programs whose consistent global execution is enforced by a consensus protocol among a decentralized network of nodes rather than by a single server, have in recent years gained traction as a means of dis-intermediating a wide assortment of non-financial tasks.
Unfortunately, the base layers of fully decentralized blockchain systems, as deployed presently, are extremely limited in their transactional throughput. The Bitcoin Core blockchain currently processes an average of 3 transactions per second [scaling] and is operating at maximum capacity, while the Ethereum blockchain is currently capped at 15 transactions per second and is often operating at its maximum capacity as well. In contrast, global payment processors handle on the order of tens of thousands of transactions per second [scaling].
This work presents a study on numerous scaling methodologies devised over the past decade and analyzes their benefits and challenges. A novel scaling direction that is composed of well-known and studied components is then introduced. A side chain construction is used to avoid any mainnet protocol changes, which would require coordinating a fork with client developers, users, application developers, and exchanges, while allowing innovations and improvements to be deployed [sidechains, forks]. Merged block production is used to secure the side chain, borrowing security from the parent chain similarly to Proof-of-Work (PoW) merged mining [merged_mining_namecoin, merged_mining_forum]. Finally, only a bare minimum of functionality is enabled on the side chain, allowing only financial transactions and atomic movement of funds between the side chain and its parent chain.
Sections 2 and 3 present fundamental technical preliminaries and previous scaling proposals. Section 4 describes our proposal for scaling using side chains with merged block production. A short discussion on practical considerations for deploying our side chain construction is contained in Section 5.
This section gives a high-level overview of the fundamental techniques that will be built upon for our proposed scaling solution. It also provides more precise definitions for various terms whose use is widespread but are often poorly, incorrectly, or incompletely defined.
2.1 Decentralized Blockchains
At its core, a blockchain is nothing more than a database consisting of a cryptographic-hash-linked chain of blocks that defines a total ordering of transactions and is deterministically verifiable. A blockchain is deterministically verifiable if its correctness can be determined using only data contained within itself (i.e., it is self-consistent), and is accomplished through the use of cryptographic hashing to link blocks together and digital signatures for each transaction. Execution engines can be built on top of this ordered (i.e., serialized) data, such as the Ethereum Virtual Machine (EVM) in Ethereum [ethereum]. Pictured in Figure 1 is a high-level representation of the blockchain data structure, with each block (square) containing a hash of the previous block in the chain (arrow).
Decentralized blockchains are of particular interest for use in systems with open participation that must be publicly auditable, such as for a dis-intermediated payments system.
A blockchain system is distributed if it is replicated across more than one physical computer (a system participant, i.e., a node).
A blockchain system is trustless if it does not require trusting any external resources to interact with it (including, but not limited to, a third-party computer, an escrow service, a trusted notary, and binary executables with no source code or non-deterministic builds). For a more concrete definition, see Definition 3.4.
A blockchain system is permissionless if it can be read from and written to without requiring permission from existing system participants. In practical terms, this property requires users to be able to participate as block producers of the system without first being part of the system.
We now compose these three preceding definitions to arrive at a useable and useful definition of the term “decentralized.”
A blockchain system is decentralized if and only if it is 1) distributed and 2) trustless and 3) permissionless.
The above definition of decentralized will be used as a metric when analyzing the properties of various scaling proposals, including the one presented in this work.
This work, given its focus on practical decentralized scaling of public blockchains, makes use of the following definitions in addition to the previous ones:
Weak trustlessness does not require trusting external resources in excess of the trust that must be placed in practice on internal resources [trusting_trust] (such as the existence of hardware backdoors [meltdown, spectre_v1, spectre_v2, foreshadow], compiler bugs [solidity_bug], signature checking for binary packages of client software [leftpad] and dependencies, backdoors and bugs in cryptographic hash functions and digital signature schemes [heartbleed], etc.).
A blockchain system is weakly decentralized if and only if it is 1) distributed and 2) weakly trustless and 3) permissionless.
2.2 Consensus Protocols
Permissionless blockchains (see § 2.1 for definitions), require a consensus protocol for writing blocks to the database [consensus]. Given their permissionless nature, they require some form of Sybil resistance mechanism (i.e., impersonating multiple users should not grant more power over the protocol). Nakamoto Consensus, introduced for use in Bitcoin [bitcoin], is the first consensus protocol that performs in a permissionless setting, and leverages Proof-of-Work (PoW) as its Sybil resistance mechanism. A cryptographic hash function can be used and modeled as a random oracle to determine a block producer [backbone], with each participant having a chance of being a block producer proportional to the computational power they devote to the protocol. The longest chain (or, more precisely, heaviest chain) of valid blocks, each with sufficient proofs of work, is considered the canonical chain—this is the fork choice rule generally used by the family of consensus protocols based on Nakamoto Consensus.
Concerns over the shortcomings of Nakamoto Consensus-style protocols (lack of strong finality guarantees, shown in Figures 2 and 3, where a previously-shorter chain overtakes the previously-longest chain and becomes the new canonical chain) along with the continued use of PoW (enormous energy waste) have led to the search of new consensus protocols that employ stake-based Sybil-resistance, known as Proof-of-Stake (PoS) [ethereum, ouroboros, avalanche].
The replication of the many desirable properties of Nakamoto Consensus, and the few undesirable properties, with stake-based consensus protocols has been unsuccessful, however. No blockchain system employing a stake-based Sybil-resistant permissionless consensus protocol with satisfactory properties has been deployed in practice as of this writing [longest_chain_pos], and to the best of our knowledge, despite many claims to the contrary no such system has been devised to date. In addition, purely stake-based consensus protocols are not decentralized by Definition 2.4—specifically Definition 2.3—as 1) there is no known way to fairly distribute stake initially and 2) participation in the system requires coins or tokens to be purchased from system participants. Both of these problems are solved by Nakamoto Consensus’ use of PoW.
2.3 The Scaling Problem
Limitations in transactional throughput for public blockchains, colloquially known as “The Scaling Problem,” present a significant roadblock to real-world adoption of such systems. The root cause of the scaling bottleneck is that every block in a decentralized blockchain network must be fully validated by every node (client) on the network. Transaction throughput can be increased trivially by sacrificing security or decentralization, so the true challenge lies in designing a system that is scalable, secure, and decentralized.
Almost universally, scaling proposals aim to not have every node validate every block, but rather have a subset of nodes validate a subset of (relevant, in some way) transactions. Layer-1 scaling proposals aim to increase the transaction throughput of the base chain, and generally employ sharding [scaling], splitting up transactions and state into individual shards instead of collecting them all into a single logical chain. In this model, transaction throughput is increased proportionally to the number of shards (minus overhead for managing shards). It should be noted that PoW-based sharding is undesirable, as security would be split between shards.
Layer-2 scaling proposals [sidechains, state_channels, counterfactual, plasma, lighting_network] aim to move groups of transactions off-chain (or, more precisely, away from the parent chain, e.g., Bitcoin or Ethereum, and onto a second layer network). Transactions can be grouped by type, or application. For example, micropayments can be done through a payment channel network [lighting_network], or transactions specific to a single application can be processed through their own chain [plasma].
We shall see later that the scaling solution proposed in this work is also centered around the idea of not having every node on the network validate every transaction, namely by enabling a majority of clients on the network to be minimal-trust (i.e., weakly trustless) light clients.
2.4 Improvements to Block Propagation
As mentioned in § 2.3, the scaling bottleneck is due to every node fully validating every block. A prerequisite to validating a new block when one is produced by the network is downloading it in its entirety.
To this end, several techniques have been proposed [bip152, graphene, minisketch], based largely on set reconciliation using bloom filters [bloom], invertible bloom lookup tables [iblt], and sketches [sketches], to allow blocks to be constructed locally based on an extremely compressed representation of a block’s included transactions. Using these, block-producing nodes can spread out their network bandwidth requirements over the entire length of the blocktime, downloading transactions as they are propagated through the network only once.
2.5 Improvements to Zero-Confirmation Transaction Security
While orthogonal to scaling, i.e., to increasing transaction throughput, lowering latency of transactions in a blockchain system can lead to an improved user experience and is another highly desirable property.
Despite potentially long block times, techniques such as weak blocks [weak_blocks, subchains] and pre-consensus [preconsensus]
can be used to provide “hints” towards upcoming blocks as a way to reduce the probability of different types of double-spending attacks being successful. This allows transactions to be accepted prior to their inclusion into a block (so-called zero-confirmation transactions), given an acceptable risk model.
Subchains [subchains] are of particular interest for scaling proposes, however. In this construction, miners that find a block satisfying a lower difficulty than required can broadcast this to the network—similar to “shares” used in mining pools. The network can then collaborate on mining a particular block, rather than mining blocks containing completely different transactions or transactions in a completely different order. Miners that do not collaborate are at a greater risk of having their block orphaned [one_tx], incentivizing collaboration. Subchains therefore allow block validation to be performed over the entire blocktime, reducing orphan rate (or increasing transaction throughput for the same orphan rate).
2.6 Fraud Proofs and Data Availability
A model for generalized fraud proofs introduced in [fraud_proofs] allows for weakly trustless light clients. Non-fully-validating nodes (known as light nodes, or light clients), only check block headers for validity—in a PoW blockchain, that valid and sufficient proof of work was done. The contents of blocks must be assumed to be too expensive for a light client to ever download and validate for even a single block.
The proposed fraud proof scheme modifies the transaction Merkle tree to add intermediate state commitments into it. A fraud proof can then consist of a parametrizable number of Merkle branches and the initial (possibly intermediate, possibly partial) state from which to begin applying transactions.
In addition to fraud proofs, [fraud_proofs] proposes to use erasure codes [erasure_codes] for data availability proofs. This is needed, as a fraud proof can’t be generated for an unavailable block.
The existence of compact fraud proofs and data availability proofs allow light clients to operate with reduced trust assumptions. Whereas without these proofs light clients required trust in a majority of block producers being honest, with these proofs light clients only require trust that a single honest node exists in the network that is capable of relaying proofs to them. In practice one will find that this trust assumption is not objectively stronger than the trust the vast majority of users place on, for example, the hardware manufacturer of their CPU, or the implementation and design of cryptographic hash functions without backdoors.
2.7 Validity Proofs and Succinct Arguments of Knowledge
Recent years have seen the emergence of almost-practical constructions employing succinct arguments of knowledge [zksnarks, zkstarks] that are zero-knowledge. This class of protocols allows a prover to generate a proof of an arbitrary arithmetic circuit’s correct execution over some input that can then be verified efficiently. It may initially seem that using these is superior to constructions that make use of fraud and data availability proofs in the context of layer-2 scaling techniques, as the latter rely on an assumption that the parent chain is readily available to post challenges to while the former always guarantees correct state execution. Unfortunately, circuit-based zero-knowledge protocols have fundamental limitations that make them inappropriate for use as a core component to scaling techniques.
First, proof generation is monopolistic rather than competitive as with PoW mining. Mining is a random process [bitcoin], and even a miner using pen-and-paper is capable of producing a block today if they get lucky; censoring other block producers requires a majority of mining power. Proof generation for these zero-knowledge protocols on the other hand is monopolistic: the user with the lowest-latency prover will always win the race to generate proofs first when attempting to prove execution of the same circuit over the same inputs. Such a system tends towards becoming permissioned over time, especially when incentives for dispersing proving power are non-existent, and will resemble single-operator child chain constructions in this regard—though without the exit game needed by those (§ 3.2).
Second, and more importantly, a completely transparent blockchain or layer-2 system can be rolled back in the event of an implementation bug—either with a forced re-organization [bitcoin_rollback, bitcoin_rollback_cve] or a forced special state transition to revert unwanted effects [thedao_hack_fix]. In contrast, in a system employing a circuit-based zero-knowledge protocol without full data availability with no further checks, a bug in either the implementation of the circuit or the trusted setup (if the protocol requires one) [zcash_bug] may result in permanent state corruption [zcash_bug_turnstile] that cannot be recovered from save for restarting the chain from genesis. For layer-2 constructions that have full data availability and use the zero-knowledge protocol only for proving correct execution of state transitions [roll_up], a larger surface for implementation bugs exist, as off-chain code must be implemented correctly in addition to the on-chain smart contract that verifies proofs. This is an especially egregious problem given the complexity of implementing arithmetic circuits and the current lack of mature tooling (i.e., formal verification, linting, etc.) for developing such programs. Attempts to alleviate this make the use of zero-knowledge proofs redundant, and reduce to a Merkle computer verification game [truebit], a child chain construction [plasma], or something similar.
2.8 Improvements to Initial Sync
Unbounded blockchain history growth is a potential problem when considering increasing block sizes as a means of increasing transaction throughput. Traditionally, the entire chain’s history is required to be available for deterministic verification as new nodes join the network.
State commitments [utxo_commitments] have been proposed to alleviate this issue, and are used in Ethereum in the form of the state trie [ethereum]. A commitment to the chain’s state is included in each block header, and as long as a majority of block producers are honest this can be used to fetch the current state of the head of the chain without doing a full sync from the genesis block. Using fraud proofs (§ 2.6), this can be reduced to an assumption on a single honest node in the network.
An orthogonal approach using recursive zero-knowledge proofs [coda] has also been proposed, where a proof of the correct execution of state transitions for each block are wrapped in a zero-knowledge proof. When done recursively, this allows verification of arbitrary blockchain history size in constant time. Note that as proof generation is monopolistic rather than competitive (§ 2.7) such a construction should not be enforced at the consensus layer, but only used as a optional tool to improve initial sync.
2.9 Merged Mining
Merged mining [merged_mining_namecoin, merged_mining_forum] is a means of re-using computational power across two or more chains. In order to merge mine a side chain with a parent chain, the block hash of a side chain block is included in a standardized way in the currently mined parent chain. If the block satisfies the difficulty of either chain (with the side chain traditionally having lower difficulty) then it is considered a valid proof of work for that chain, and the block is appended to the appropriate chain [alternative_chain]. This is illustrated in Figure 4, with some blocks of the parent chain () including hashes of the merge mined side chain ().
Note that, when implemented with a naïve longest-chain fork choice rule, this allows the side chain to re-use hashing power from the parent chain, but not borrow security.
The security of a blockchain system is the cost of changing its history (i.e., rewriting blocks through a chain re-organization).
Since parent chain blocks cannot be used for checkpointing in this scheme (especially if the difficulty of the side chain is higher than that of the parent chain), as parent chain blocks are merely superblocks [nipopow], the merge mined side chain only gains security against external hashing hardware [sia_asics].
As a consequence of this, a common criticism against merged mining is the virtually zero cost of attacking a side chain by the miners of the parent chain. As we shall see later, this criticism does not apply in any meaningful way to the scaling solution presented in this work. The opportunity for non-trivial fork choice rules—beyond the traditional longest-chain rule[bitcoin]—using the parent chain as a timestamping mechanism mitigates this issue.
3 Related Work
A wide range of scaling techniques have been proposed over the years, and are discussed in this section. More importantly, an analysis of incentives and shortcomings for each of these techniques is shown.
3.1 Pegged Side Chains
The general idea of using side chains to deploy innovations and improvements to a chain without interruption has been suggested for many years [sidechains]. Namecoin [namecoin] is one of the more prominent and early examples of a side chain that runs alongside Bitcoin and acts as a decentralized DNS.
A side chain is a blockchain that validates data from one or more other blockchains (adapted from [sidechains]).
In plain English, a side chain runs alongside a parent chain (or possibly more than one parent chain, though this configuration is not used much in practice) and “understands” the existence of a canonical parent chain. This allows it to yank data (events) from the parent chain to perform actions on its own state. Note that there are no requirements on how a side chain is secured—indeed, running a side chain with its own independent consensus protocol is generally counter-productive as this will make it less secure than its parent chain.
It is generally understood that a completely trustless and secure two-way peg of assets is impossible [drivechain], though moving assets from the parent chain to the side chain is possible using the yanking scheme described above.
While there have been attempts to implement a two-way peg using light-client proofs [pos_sidechains, pow_sidechains, drivechain], such constructions are vulnerable to a minority of block producers on the parent chain or a majority of block producers on the side chain—which presumably will be less costly to attack than the parent chain.
3.2 Child Chains
Plasma [plasma] introduced child chains as a potential scaling methodology. At a high level, a child chain operates in much the same manner as a side chain: funds (or, more generally, state), can be yanked from the parent chain—Ethereum—to the child chain, while state can be exited through a commit-challenge scheme known as an exit game.
An operator is usually responsible for collecting transactions into blocks and committing block hashes to the parent chain (this allows the child chain to borrow security from the parent chain without having to run a permissionless consensus protocol of its own). Several variations of Plasma chain constructions have been proposed, using different data models [plasma_mvp, plasma_cash], though they are all with significant unresolved issues. Fungible iterations of Plasma [plasma_mvp] require mass exits, as there are no guarantees of child chain liveness or safety, while non-fungible iterations of Plasma [plasma_cash] require maintaining an every-growing history of proofs.
Operators can generally misbehave in two ways: 1) censoring a user’s transactions or 2) attempting to fraudulently exit state (i.e., assets) back to the parent chain. Each of these are resolved by allowing users to 1) force a state transition on the child chain by executing it on the parent chain or 2) prove an invalid state transition occurred on the side chain, on the parent chain (which can be done implicitly in the case of a mass exit as a response to block withholding by a malicious Plasma operator).
Before diving into a more formal definition of child chains, we must first go over definitions of blockchain state. State elements in a blockchain system come in two variants: owned (associated with a public address), or unowned (not associated with any address i.e., unused). Delegation of ownership is possible, so the “owner” of a state element can be considered both the owner of the private key associated with the address of the owned state element and any potential delegated owners that can be granted varying permissions over the state element.
An owned state element is live if it can be modified by its owner (or by its delegated owners within their permissions) in finite time. An unowned state element is live if it can become owned in finite time.
An owned state element is safe if it can never be modified by any of its non-owners (or by its delegated owners outside their permissions). Unowned state elements are trivially safe.
Interestingly, we can use these to form a more concrete, and more importantly non-circular, definition of “trustlessness”:
A blockchain system is trustless if and only if its state is (i.e., all its state elements are) both live and safe.
Using this new definition of trustlessness and the previous plain-English description of child chains, we can compose a more formal definition of the latter:
A child chain is a trustless side chain that borrows security from its parent chain through periodic commitment of block hashes (i.e., including a side chain block hash into the parent chain as a state transition).
Note that this definition does not include permissionlessness—indeed, one can see that the single Plasma operator model allows this actor to exclude any user from participating in their child chain.
Definition 3.4 is useful when evaluating layer-2 scaling techniques. Unlike layer-1 blockchain systems, which are physical systems, layer-2 constructions that anchor onto parent chains for security and other guarantees can be thought of as logical abstractions, for which the notion of trust in physical machines or persons isn’t useful.
It is trivial to see that a blockchain system that is trustless by Definition 3.4 under no assumptions will violate the FLP impossibility [flp]. Indeed, even layer-1 blockchain systems are only trustless under a majority block producer assumption: in the case of PoW blockchains, a majority of miners can censor transactions indefinitely. Therefore our goal is to minimize the assumptions needed to make such systems trustless i.e., enable weakly-trustless systems.
It follows that child chains are dependent on an honest majority of block producers for state safety; additionally, one critical caveat is that a child chain is only trustless under the strictly stronger assumption that block space is available on the parent chain to either force a valid state transition or challenge an invalid state transition in finite time.
Channels were first envisioned as payment channels [payment_channels] between two or more parties to allow them to exchange money almost instantly without waiting for transactions to be included into blocks on a blockchain. More general-purpose state channels [state_channels, counterfactual] were later described as a mechanism for participants of the channel to agree on potentially arbitrary state rather than just payments.
A channel proceeds by unanimous agreement among a fixed set of channel participants to update its state. This allows them to have instant finality, as any participant can close the channel by publishing the agreed-upon latest state to the blockchain. A user that attempts to close a channel with an old state can be met with a challenge with a more recent state, which by definition is signed by all parties.
While the instant finality offered by channels is undoubtedly a significant advantage over side chains and child chains, unanimous agreement has several drawbacks. First, all channel participants must be online in order to sign and agree to a state update, and the set of participants is fixed at channel creation. Second, there is no way to distinguish a user who lost their copy of the most recent state with a malicious user attempting to close the channel with an old state to their advantage. As only the latest state is valid in channel schemes, users can only make copies of their local state, not backups—the two protect against fundamentally different classes of data failures, with copies being strictly less useful.
Payment channel networks [lighting_network] aim to alleviate the problem of having a fixed participant set by allowing agreement to take place atomically between users with bidirectional payment channels open between themselves. The issues this introduces are legion, and enumerating them is outside the scope of this work.
Note that similarly to child chains, channels can only be made to be trustless if block space is available on the blockchain they operate on. Thus there is an implicit assumption on an honest majority of block producers.
4 Scaling Decentralized Blockchains
This section discusses in-depth our proposed scaling solution of a side chain with merged block production for financial transactions. The proposed construction is capable of handling a large number of transactions per second with virtually identical security and decentralization as its parent chain. Comparisons to other scaling proposals are also discussed.
Without loss of generality, we will assume the parent chain of this system is Ethereum [ethereum] and use associated vocabulary. Any chain with sufficient expressibility for smart contracts and statefulness will suffice.
4.1 A Side Chain for Financial Transactions
Despite suggestions to parallelize validation of blocks in Ethereum, the bottleneck of client software is in practice disk I/O bandwidth [eip648]. A combination of poor design of the EVM’s opcodes, complex expressivity, and use of an inefficient state trie data structure make it challenging to develop both efficient software and potentially hardware to validate transactions.
This work proposes a side chain construction with just enough expressivity for performing financial transactions and atomic movement of funds between the side chain and its parent chain (with optional multisignature functionality to support a subset of state channel constructions). Rather than the accounts data model of the EVM, a UTXO data model is used, as the latter is simpler to reason about and optimize parallel implementations for in practice.
The side chain’s consensus protocol can range from a naïve longest-chain fork choice rule to a more involved protocol that makes use of the parent chain as a timestamping mechanism to prevent long chain reorganizations. This is discussed in more detail in Section 4.2. Security is borrowed from Ethereum by allowing block producers (i.e., miners) to mine side chain blocks at no additional cost in terms of hashing equipment or environmental costs, much like with merged mining. Unlike child chains with a single operator or a purely stake-based set of block producers, this scheme is permissionless and can be implemented today. While an ASIC-friendly hashing algorithm would be preferred [sia_asics, sia_pow], the expressivity and statefulness of Ethereum over e.g., Bitcoin allow it to be used to ensure trustless operation of the side chain.
Since the total reward for each block is the sum of the rewards for both the parent chain and the side chain, a block reward on the side chain is not needed to ensure progress of the chain [instability]. The side chain can subsist on transaction fees alone so long as the parent chain guarantees its own progress.
Thanks to the existence of general-purpose compact fraud proofs and data availability proofs (§ 2.6), the number of transactions included per block can be increased to an arbitrarily large size bound only by physical limitations of block-producing (i.e., mining) nodes, while the overall system still remains weakly decentralized.
As with all other scalability proposals, increased transaction throughput is achieved by not having every node in the network validate every transaction: with this scheme, only mining nodes are required to fully validate blocks in a short time—or, using subchains, even mining nodes do not have this requirement—while a potentially huge number of light clients (potentially billions) only need to validate compact proofs and block headers.
It should be noted that the design presented in this section is usable for building a scalable decentralized payment system, but not a general-purpose stateful smart contract execution platform.
4.2 Merged Block Production
As discussed in Section 2.9, merged mining with solely a longest-chain fork choice rule only provides security against external hashing hardware (or, more precisely, hardware that can be used for hashing with the parent chain but is not currently doing so), but provides no security against current miners. In the case that only a small minority of miners are merged mining the side chain, it becomes rather trivial to perform arbitrary-length reorganizations by merged mining hidden chains.
Borrowing inspiration from child chain constructions (§ 3.2), we propose a scheme whereby block producers (i.e., miners) include commitments to side-chain blocks in the parent chain blocks they mine, effectively “merge mining” the two chains, while foregoing the complexities of re-using proofs of work. In a stateful system like Ethereum, this can be implemented as simply as the miner including a transaction to a standardized contract recognized by the side chain, with only the first side chain block commitment being valid for a given parent chain block. Normal operation of the side chain and parent chain is shown in Figure 5; note that unlike in merged mining (Figure 4) side chain blocks cannot be produced without a linked parent chain block.
This scheme allows for powerful organic reorganizations to occur, as shown in Figures 6 and 7. For illustrative purposes, suppose there are two miners, Alice (blocks superscripted with ) and Bob (blocks superscripted with ). In Figure 6, Alice and Bob each mine on top of the longest parent chain they are aware of; in this case, there is a tie, so they are each mining on a different fork. By luck, Alice finds her block first and broadcasts it to the network. Bob, seeing this new block, then begins mining on top of this new longest chain as shown in Figure 7. Unlike child chains, which rely on users to broadcast transactions to a global canonical chain, this scheme supports organic short reorganizations, as each fork is internally consistent and locally canonical to the miner working on it. This also allows the side chain to seamlessly support persistent chain split scenarios, e.g., in the case of a contentious hard fork.
While using a naïve longest-chain fork choice rule would certainly allow this scheme to borrow security from the parent chain against external attackers, it would also allow for zero-cost arbitrary-length reorganizations to be performed by the parent chain’s miners. We can do better than that, however, using the parent chain as an always-available timestamping server. The proposed scheme, unlike traditional merged mining, does not allow for the mining of hidden side chains, since side chain blocks are only valid if a commitment to them is included in the parent chain.
A longest-chain fork choice rule can be augmented with the following high-level rules, whose operation is illustrated in Figure 8:
After a parametrizable number of parent chain blocks, valid side chain blocks can be considered finalized i.e., irreversible. This prevents arbitrarily long chain reorganizations of the side chain, regardless of fraction of miners engaging in merged block production.
The above rule opens up the side chain to a long reorganization of the parent chain creating an invalid chain of blocks on the side chain if a malicious miner withholds blocks, e.g., as would happen in a majority-hashpower attack on the parent chain. To work around this, a challenge can be issued for each block’s validity and availability, requiring the miner of each side chain block to bond a parametrizable amount of parent chain funds for a parametrizable amount of time. The former can be done with zero-knowledge proofs [zksnarks], while the latter can be done with data availability proofs [fraud_proofs]. This challenge system may be removed partially or entirely if stake-based finalization is added to the parent chain [casper_ffg, casper_ffg_incentives].
A formal specification and analysis of a complete set of fork choice rules is outside the scope of this paper and will be reserved as the subject of future work.
One downside with the proposed scheme, when compared to traditional merged mining, is inconsistent block times, and therefore time-to-first-confirmation. This can be alleviated to a significant degree with improvements to zero-confirmation transactions (§ 2.5).
4.3 Two-Way Peg Mechanism
Fungible assets (e.g., Ether [ethereum], ERC-20 tokens [erc20]) can be moved to the side chain by being locked into a smart contract on the Ethereum chain. Side chain clients recognize these deposits and allow minting of assets on the side chain in a 1:1 manner. Augmenting UTXOs to also include an asset type, rather than assuming all UTXOs are of a single asset, allows such tokens to be transacted seamlessly.
Transferring assets from the side chain back to the parent Ethereum chain is a more involved process, especially when considering allowable upgrade processes (§ 4.4). A few options are available:
Only allow a one-way peg from the parent chain to the side chain. This completely removes any opportunity to steal funds through a fraudulent withdrawal, regardless of control over block producers, but locks assets into the side chain irreversibly.
Use a child chain exit game (§ 3.2) to allow users to withdraw a UTXO from the side chain to the parent chain. The UTXO amount is then unlocked on the parent chain after a challenge period expires uncontested, or a successful challenge is made in the event of a fraudulent exit and no funds are unlocked. In the worst case, a majority of block producers on the parent chain can censor challenge transactions, making this scheme at least as good as optimal child chain constructions that rely on fraud and data availability proofs. It is also strictly better than single-operator child chains, as there are no assumptions on cheap and available block space. More precisely, state is live under an honest-majority assumption (just as with a classical layer-1 blockchain) and safe under an honest-majority assumption (which is optimal for layer-2 constructions that do not have full data availability).
Non-blind merged block production can be used, whereby block producers of the parent chain will actually orphan parent chain blocks that contain an invalid child chain block, effectively uniting the two chains. This can be accomplished via a backwards-compatible change to the consensus rules of the parent chain, i.e., a soft fork [soft_fork], and should only be done if a majority of block producers are already engaged in merged block production. In the event that block producers do decide to collude in an attempt to steal funds that should be locked into the side chain, they can be coerced to behave honestly [uasf].
4.4 Upgrade Process
If the system is chosen to only support one-way asset (or, more generally, state) pegs, then any blockchain upgrade process can be applied. On the other hand, if the system is chosen to have a two-way peg using an exit game, chain splits must be avoided: the side chain will know of a canonical parent chain but not the converse.
A few options to allow modifications to the consensus rules of the side chain are available:
Completely disallow any non-backwards compatible rule changes (i.e., hard forks). If an upgrade that requires a hard fork is needed, users can simply withdraw their funds from the side chain back to the parent chain, then deposit them again to a brand new side chain with the new consensus rules. Extending this, the new side chain can mint coins based on proofs of burning of funds on the old side chain, allowing users to bypass having to return to the parent chain.
Miner signaling [bip135] can be used to enforce changes in rules. This feature allows a blockchain to be “aware” that its rules are changing, preventing chain splits. In the event that only a minority of miners on the Ethereum chain are engaged in merged block production for the side chain, a parametrizable majority of all Ethereum hashpower can be used for signaling changes.
The incentives of the proposed scaling scheme are in line with those used in Nakamoto Consensus—namely, that miners ought to find it more profitable to play by the rules, such rules that favor them with more new coins and transaction fees in both the side chain and parent chain than everyone else combined, than to undermine the system and the validity of their own wealth (adapted from [bitcoin]).
Unlike child chains that support fungible assets [plasma_mvp], which rely on users with limited computational resources to validate child chain blocks in their entirety, this scheme leverages parties that have access to large computational resources and that are inherently incentivized to validate all side chain blocks competitively: miners. This resolves the verifier’s dilemma [verifiers_dilemma] present in most child chain exit game schemes, and obviates the need for mass exits. In addition, proofs of invalid state transitions on the side chain, required for exit games, can be included in the parent blockchain by a single honest miner. Preventing this could only be done by a sustained majority-hashpower attack for the duration of the challenge period, which would have much farther-reaching consequences on the parent chain rather than be isolated to attacking the side chain.
5 Practical Deployment Considerations
Increasing transaction throughput by increasing the transaction capacity of blocks is not a new idea [blocksize_increase], though previous work in this area has mostly been focused on doing so on the base blockchain layer [bch] rather than through a side chain. Such attempts have been plagued by legacy code and network support, making it difficult to coordinate significant, but potentially very beneficial, breaking changes. More importantly, improving validation performance on a smart contract-enabled blockchain is more difficult than on a much simpler side chain that is only capable of financial transactions and moving assets to and from its parent chain.
Interestingly, the implementation of this scaling direction would be not desirable at the base layer, as a non-programmable base chain incapable of smart contracts [szabo1997] or atomic swaps [atomic_swaps] cannot be extended in a decentralized manner with additional functionality.
Proper deployment of such a system is of critical importance to its success, even more so than designing and implementing the underlying technology stack. Historically miners have not acted rationally [resolution_bitcoin], as incentivized by Nakamoto Consensus. Developers and users have often taken an antagonistic stance towards miners and the boogeyman of miner centralization, often without proper understanding of the economics at play [sia_asics, sia_pow], and have on occasion even encouraged the reduction of network security [eip649]. In order to deploy the system described in this work at scale, it is important for developers, users, and miners to work together, as doing so actually provides maximum benefit for all parties involved.
To that end, client software should be made as simple to use as possible for miners, allowing them to run nodes for both the parent Ethereum chain and the side chain with little to no additional setup and no patching of code required. A tight integration of mining software, clients, and user frontends will encourage the incentives of Nakamoto Consensus to take precedence over other considerations.
It should finally be noted that the general scheme of using merged block production does not require PoW to be used as the rate-limiting mechanism. Indeed, since proofs of work are not re-used directly, the system can also operate under PoS, albeit potentially requiring a different set of fork choice rules [longest_chain_pos].
In this work we introduce a blockchain scaling solution that is both secure and decentralized in practice, and allows for greater transaction throughput than conventional blockchain systems deployed today. In addition, several terms that have emerged in common blockchain parlance are given proper definitions so as to enable and encourage collaboration without confusion.