Blockchain technology is evolving very fast. We are witnessing the development of more and more real-world applications which demonstrates strong interest from both industry and academia. Furthermore, blockchain technology is on its way to challenge the performance of centralized systems with different blockchain projects now reaching a throughput in the thousands of transactions per second, e.g., Solana [Solana], Avalanche [Avalanche], Polkadot [Polkadot], or Algorand [10.1145/3132747.3132757].
Since its inception, blockchain technology has mainly focused on creating very sparse and standalone networks, all decoupled one another, trying to solve different challenges. Such heterogeneity has forged a future of blockchain leaning towards the coexistence of multiple layer-1 chains over the domination of a single network. The proliferation of application-specific blockchains and smart contract platforms hosting new instances of existing dApps and DeFi protocols will continue to accelerate the general adoption of Web3 technologies. Thus, the need for interoperability continues to grow considerably. It now appears crucial to solve the interoperability challenge as it will improve overall blockchain scalability and pave the way for new business opportunities by composing applications hosted on different blockchain systems.
During the decade that followed the release of Bitcoin [Bitcoin], there has been a continuous and global effort to bring blockchain technology to an industrial level. This effort has primarily targeted some of the most important blockchain issues: scalability and interoperability.
Scalability. Blockchain scalability is closely pegged to two of its upmost metrics: its latency (speed) and its throughput (capacity). Latency represents the time a transaction takes to be inserted in a block and for it to be accepted by the network, while throughput relates to the number of transactions the network is capable of adding on-chain per unit of time. For uncertain reasons, these concepts are often made misleading and thus prevent the community from gauging the true performance of a blockchain system. Transaction finality, probabilistic or deterministic, has to be considered when evaluating blockchain performance. In this context, latency is defined as the time it takes for a transaction to be finalized, while throughput is defined as the number of finalized transactions per unit of time. A later phase in blockchain technology history has seen the generalization of deterministic finality, commonly achieved by means of classical BFT consensus algorithms [https://doi.org/10.48550/arxiv.2001.11965, Kwon2014TendermintC, 10.1145/2976749.2978399, https://doi.org/10.48550/arxiv.1803.05069]. Such algorithms have shown their limitations in terms of scalability for they come with a quadratic message complexity, and as such lead to much higher settlement latency as the number of validators increases.
Interoperability. Interoperability lies in the capacity of multiple systems to interface with each other. Exchanging assets and data between blockchains is key to the adoption of the technology and yet has historically been a reality only within an environment of trust, either internally between the exchanging parties or externally via third-party bridge administrators. In addition, specialization, a scalability enabling principle centered around the segregation of a multi-app blockchain into application-specific chains, is a straightforward call for interoperability.
For an industrial age of blockchain to emerge, we add two additional components: composability and privacy.
Composability. Composability is a design principle that allows different components within a system to be combined to meet any specific use case requirements. Within a single blockchain network like Ethereum [Ethereum], composability is atomic: Smart contract functions can invoke other contracts synchronously with the insurance that either all contract calls succeed or none does. In a context of cross-chain interoperability, composability is obtained when business logic deployed on different blockchains can interact with each other to create new value.
Privacy. Historically, privacy has remained a rarity in the blockchain scene, transactional data being accessible to all network participants by design even though private transaction protocols such as [Monero, Zcash] have allowed for keeping this data hidden while preserving transaction validity. For enterprise use, being interoperable while keeping internal data private is fundamental. Such a model is still lacking today, preventing organizations from switching from Web1-2 technologies to Web3 ones.
1.1 Our Contributions
We propose Topos, a generalized interoperability protocol designed for transmitting messages across sovereign blockchains. The Topos ecosystem is composed of a permissionless reliable broadcast primitive [10.5555/1972495] and a scalable set of decoupled public and private blockchains, named subnets. Topos ensures the validity of state transitions without relying on fraud proofs [AlBassam2018FraudAD] or on designated subsets of participants to perform validity checks [Polkadot].
In the interoperability landscape, trustlessness is defined as the absence of trust in the interoperability protocol itself and relying instead on the security of the underlying blockchains. While this is an improvement over trusted interoperability protocols, this does not permit complete trustlessness and cannot provide a higher level of security than that of the interoperated blockchains. With Topos, we decouple the validity of cross-chain transactions from the security of the underlying blockchains by replacing this coupling with a zkSTARK proof system, providing irrefutable evidence of the validity of messages.
As part of their implementation of the Universal Certificate Interface (UCI), subnets integrate the Topos zkVM (Zero-Knowledge Virtual Machine) to host an arbitrary number of applications which can exchange assets and arbitrary data with other subnets by exchanging objects called certificates. Certificates are a central component used by subnets to exchange cross-subnet messages. Cross-subnet message transmission is handled by the Transmission Control Engine (TCE), a decentralized network implementing a reliable broadcast primitive. Additionally, the protocol enforces computational integrity of all state transitions among all subnets by means of a zkSTARK proof system. Moreover, we introduce modifications to the FROST signature scheme [frost] to allow the TCE participants to authenticate incoming certificates prior to their verification and delivery. The combination of UCI and TCE provides the ecosystem with uniform security and as such, subnets do not need to rely on any trust assumptions but cryptographic assumptions for cross-subnet message passing. Finally, we present the Topos Subnet, a subnet responsible for maintaining registration on the protocol in order to minimize overall protocol complexity, while not being used for state synchronization nor cross-subnet message passing.
The rest of the paper is organized as follows. Related works are discussed in Section 1.2. In Section 2, we introduce the properties of our solution. We present in Section 3 the design considerations of our protocol and its components. Section 4 shows several use cases in which our solution is well-suited. In Section 5, we summarize the paper with concluding remarks and Section 6 is dedicated to additional discussions and future works for our solution.
1.2 Related Work
Over the years, several projects have focused on interoperability and scalability. Here we introduce some of the most contributing projects.
Cosmos. [Cosmos] Cosmos is a network of sovereign blockchains called zones. Zones are endowed with Tendermint [Kwon2014TendermintC] and are connected with each other via the IBC protocol. Cosmos can only support interoperability between BFT-based blockchains and to do so employs a simple model for their inter-blockchain communication protocol by centering its design around the use of decentralized relayers and on-chain light-clients to allow connected zones to verify each other’s block headers and validate transaction inclusion proofs. The validity of cross-chain transfers is left to interpretation based on the trust zones have in each other since, as opposed to Topos, Cosmos does not use validity proofs of state transitions.
Polkadot. [Polkadot] Polkadot is a shard system composed of a central entity called the Relay Chain along with shards called parachains. The purpose of Polkadot is to let parachains free of any security concerns so that developers can focus on the application layer. This is enabled by two factors (a) an abstraction of the internals of the parachain providing a standard verification for all parachains (b) an active validation executed by randomly and frequently sampled Relay Chain actors. These properties reflect the so-called Shared Security. The validity of parachain block candidates is ensured by Relay Chain validators, hence prevents parachains from being sovereign networks and from having a private state. On the contrary, Topos allows subnets to have their own consensus, a private state, and trustless interoperability.
Avalanche. [Avalanche] Avalanche is a highly scalable blockchain platform that focuses on the deployment of blockchains with three main targets: application-specific blockchains, smart contract platforms, and digital asset platforms. In Avalanche, blockchains are deployed within subnets (which are sets of validators). Validators are validating all chains in their subnet, as well as the three chains composing the Primary Network. The Avalanche platform has no bounds in the number of blockchains which can participate but offers interoperability only between the chains of the same subnet. Topos does not face this limitation: Upon implementing the Topos protocol, subnets are interoperable with the whole ecosystem. Although cross-subnet interoperability is envisioned on Avalanche, no design has been revealed yet.
Chainlink. [Chainlink] Chainlink 2.0 is a framework that aims at solving the oracle problem [Oracle] by introducing the Decentralized Oracle Network (DON). Historically, oracle services introduce trust. However, Chainlink tackles this problem by filtering the off-chain data source through a BFT layer. The committee of Oracles that composes the DON sign their reports with a multi-signature scheme. By doing so, Chainlink is increasing decentralization and minimizing trust in oracle services. Furthermore, Chainlink proposes an interoperability feature with its Cross Chain Interoperability Protocol (CCIP). The global consistency of cross-chain communication in CCIP is reduced to the security of their Anti-Fraud Network which is a dedicated DON actively watching for misbehavior across other DONs. In Topos, global consistency of cross-subnet messages is passively obtained via the TCE’s reliable broadcast protocol. In addition, validity of cross-subnet messages is ensured cryptographically in Topos whereas Chainlink relies on agreements between oracles.
LayerZero. [LayerZero] LayerZero is an interoperability protocol which decouples provision of block headers and transaction proofs to allow for trustless cross-chain communication. Each action is handled by two centralized or decentralized parties, namely an Oracle and a Relayer. The cross-chain message passing is trustless under the assumption of independence between the Oracle and the Relayer. The Oracle transports the block headers while the Relayer submits the transaction inclusion proofs. In LayerZero, cross-chain transactions are not cryptographically guaranteed to be valid as opposed to Topos’s proofs of computational integrity which enforce validity of all transactions.
An interoperability protocol should be trustless, secure, and have strong network effect. The following properties need to be maximized if the vision of an “Internet of blockchains” is to be realized. By design, the Topos protocol fulfills these properties comprehensively.
Trustless. Subnets receiving certificates and cross-subnet messages from another subnet should have guarantees as to the validity of these cross-subnet messages. These guarantees should not rely on trust assumptions in centralized entities, decentralized actors, or the interoperated subnets, but on cryptographic assumptions. Leveraging succinct zero-knowledge proofs allows for removing this trust completely from the equation and solely relying on mathematical truth.
Security. The protocol must be robust and prevent an adversary from creating conflicting certificates in an attempt to double-spend via cross-subnet messages, as it would cause consistency issues in the system.
Scalability. There should be no limit to the number of participants in the ecosystem. The protocol should be able to handle an arbitrarily large number of subnets, as well as to seamlessly scale to millions of TCE participants, by ideally ensuring logarithmic communication complexity per participant. Furthermore, the system must have very high capacity to be able to process a massive amount of cross-subnet messages.
Privacy. It should be possible for subnets to keep their internal state hidden from the rest of the ecosystem. Thus, the protocol cannot rely on having the receiving subnet nor any third-parties actively verify cross-subnet messages by accessing the state of the sending subnet. Instead, cross-subnet messages should contain indisputable evidence that these messages are correct. By design, the protocol should be able to handle any type of subnets, i.e., public and private subnets.
Authentication. It is important that data exchanged between subnets is authenticated to provide guarantees of integrity. Authentication using threshold signatures should have a public key that remains static for the whole lifespan of subnets to facilitate key management. As such, it should be possible for any actors, in and out of the protocol, to verify authenticity of cross-subnet messages.
Decentralization. The protocol should allow for permissionless participation in the TCE and open registration of subnets. Participation should not be handled by a central authority, and processes should be able to join the system at any time. To enable high levels of decentralization, it is also necessary that the entry cost for participation remains low, such that common hardware is enough to fully participate in the system.
The Topos protocol is a generalized interoperability protocol which enjoys strong network effect. Once a blockchain has implemented Topos, it becomes interoperable with all the blockchains in the ecosystem, without any overhead. Topos complies with all the properties detailed in the previous section. In this section, we will first describe all the components that compose Topos, then detail the protocol itself.
3.1 System Overview
Here, we define all the components that together make the Topos protocol.
Subnets are sovereign blockchain networks which implement the Topos protocol, devise their own consensus rules, and control their own native asset. New subnets join the ecosystem to be natively interoperable with all existing subnets without making any compromise on their sovereignty and without the need to trust any middleman. Though not a protocol requirement, subnets are expected to implement classical BFT protocols to enforce deterministic finality. This will help the subnet to guarantee that the state submitted to the rest of the ecosystem is finalized, i.e., cannot be reverted, hence avoiding the subnet to be inconsistent between its internal state and its submissions.
The first subnet client will be our Substrate DevKit which is Topos’s extension of the Substrate framework [Substrate]. As any Substrate native blockchain, subnets can implement their own consensus protocol and state transition function by customizing their own set of runtime FRAME pallets. Topos’s Substrate DevKit additionally adds on top of Substrate the necessary components for subnets to be compatible with the UCI. One significant addition is the integration of the Topos zkVM (see 3.1.2) as the core smart contract execution environment for subnets in the Topos ecosystem.
In later iterations of the protocol, other DevKits will be created by the Topos community as extensions of other blockchain frameworks and will allow developers familiar with any tech stack to join the Topos ecosystem.
Subnets implement the Topos zkVM, a zero-knowledge virtual machine that exposes a Turing complete programming language which allows instructions to be provable with zero-knowledge proofs. dApp developers can use the Topos zkVM programming language to write any type of application that are deployed on any subnets in the form of smart contracts whose executions are provable. Developers can as such leverage the composability offered by the Topos protocol by composing their applications with other zkVM-compatible applications deployed on any subnets in the Topos ecosystem.
The Topos zkVM has been conceived to offer a set of instructions efficiently verifiable with our zkSTARK proof system. This instruction set, while small and simple, remains expressive enough for developers to easily write any kind of application on subnets. We also include into the default instruction set additional operation-specific instructions (e.g., range check, curve point addition, hash evaluation, etc.), to allow programmers to execute common operations directly without the burden of writing them with the original instruction set. The Topos zkVM execution remains extremely fast to verify—maintaining the overall scalability of the system—even when extending the original instruction set with custom complex ones.
3.1.3 Universal Certificate Interface
The Universal Certification Interface (UCI) enfolds the concept of proving and verifying data across different subnets. This notion is key in trustless and secure interoperability: A sending subnet generates data intended for another subnet and the receiving subnet is ensured that the data is valid and authentic without the need for trust in the sending subnet or any third party. The UCI offers an abstraction of the internal structure of a subnet to guarantee these properties of validity and authentication without compromising the sovereignty and privacy of the sending subnet.
In the Topos ecosystem, the UCI exposes the interface that all subnets implement in order to be interoperable with each other, i.e., exchange certified data. This interface describes how certificates are to be constructed and authenticated by subnets.
A certificate is an authenticated object that wraps exchanged data with a proof of validity. The data subnets exchange are cross-subnet messages, i.e., cross-subnet asset transfers and remote arbitrary smart contract calls. Authentication of certificates is done with Topos’s ICE-FROST signature scheme and proofs of valid state transition are created using our zkSTARK proof system.
The structure of a certificate is described below:
subnet_idis the static ICE-FROST public key used as the unique subnet identifier;
prev_state_hashis the previous subnet state commitment (from the previous certificate);
state_hashis the current subnet state commitment;
proofis the zkSTARK proof of validity;
XS_listrepresents the list of included cross-subnet messages;
proof_XS_listis the list of inclusion proofs of cross-subnet messages in the proven state transition.
By including proofs of valid state transition in certificates, sending subnets prove the validity of all their internal transactions (including cross-subnet messages) executed since their previous certificate. This allows a receiving subnet, i.e., a subnet to which at least one cross-subnet message contained in the certificate is addressed, to verify the validity of a message without having access to the state of the sending subnet nor relying on a third party to verify the complete state transition. We envision that the actors creating these proofs will be subnet validators although the Topos protocol does not impose any requirements.
A valid state transition is defined as follows:
Definition (Valid State Transition). Let be a state transition function, where is the -th subnet state committed to in the -th certificate, and is a set of transactions which applied to results in . We say that a state transition is valid if and only if: , is a transaction correctly executed by the Topos zkVM.
The zkSTARK proof included in the certificate verifies that the set of transactions between and is a valid state transition. While this does not ensure the validity of the subnet state, it guarantees the validity of its state transitions. Thus, if the state initially committed to by the subnet as per its registration and all of its subsequent state transitions are valid, then by transitivity its latest state is valid.
The certificate validation is handled by the Valid_cert predicate, defined in Algorithm 1, which calls the zkSTARK predicate (see Equation 1) on the certificate data to assert the validity of the committed state transition, checks the inclusion proofs of the cross-subnets messages in the proven state transition, and returns true if both checks succeed. In this case, the predicate provides the certificate with intrinsic validity: The certificate contains all the necessary information to prove its validity and its verification does not depend on an external state—the predicate is stateless and so trivially monotonic.
188.8.131.52 ICE-FROST Signature
A signature scheme with key generation, signing, and verification algorithms , , and respectively, and security parameter is a threshold signature scheme if the following conditions hold:
Correctness. Any subsets of participants with cardinality at least can produce a valid signature on message . A valid signature is a signature that will be verified by the algorithm.
Unforgeability. Any polynomial-time adversary who can corrupt up to players and views the protocol output (signature) on input messages of their choice cannot produce the valid signature for a message that has not been submitted to the algorithm before.
The Topos protocol employs threshold signatures to authenticate certificates, i.e., allow actors of the ecosystem to verify that a propagated certificate has been created by the correct subnet and has not been tampered with in transit. The Topos ICE-FROST signature [cryptoeprint:2021:1658] is the first to consider static private/public keys for a round-optimized Schnorr-based signature scheme [10.1007/0-387-34805-0_22]. With static public keys, the group’s established public and private keys remain the same for the lifetime of the subnets, while the signing shares of each participant are updated over time, as well as the set of group members. This ensures the long-term security of the static keys and facilitates the verification process of the generated threshold signature because a group of signers communicate their public key to the verifier only once during the subnet’s lifetime.
Dealerless threshold signature schemes usually need to run a Distributed Key Generation (DKG) [DKG] protocol each and every time the set of participants changes, resulting in a new public key. However, the TCE requires knowing the public key associated with the signature in order for the TCE participants to verify the signatures applied to certificates. A natural approach would be to include the threshold signature public key for the next certificate in the current certificate but such short-lived public keys clearly lead to large overhead and are not suited for the Topos protocol.
Our contribution to the field of threshold signatures makes long-lived static public keys possible. Topos uses ICE-FROST [cryptoeprint:2021:1658] to enforce usage of a single static key for the whole lifespan of the subnet, no matter how many times the set of validators changes. This allows for a lighter and simpler subnet key management.
In order to use a long-lived public key for each subnet, we add a share update property to our scheme. To update the shares for each validator set, participants secretly share the value “0” and send corresponding shares to other participants. These new shares are added to previous shares to randomize them without changing the value of the shared secret. Randomization of shares guarantees unforgeability of the threshold signature scheme against a static adversary, i.e., an adversary who can corrupt up to participants. A dynamic adversary on the other hand can corrupt different participants in each validator set. Because validators secret shares need to be encrypted when redistributed, we need the additional property of forward secrecy. That is, an attacker that would get access to some validator decryption key would only be able to derive decryption keys between this compromised validator and future ones, but would not be able to decrypt messages encrypted and shared by previous validator sets during the shares redistribution phase, and hence would not gain knowledge of additional secret shares. This key property ensures that even if the adversary corrupts different subsets of participants in consecutive validator sets, they still cannot forge a valid signature.
When a subnet submits a certificate, it commits to a new state and certifies the state transition. The new state is the state committed to in the previous certificate on which the state transition is applied. To convince other subnets that the state transition is valid and consistent with the previous state, the certificate contains a proof of computational integrity, a zkSTARK proof [BenSasson2018ScalableTA].
A STARK proof guarantees that a computation has been correctly executed and has returned a certain output, and (if needed) without revealing the input. For example, a STARK proof can guarantee that:
The state is the state plus some transactions, without revealing the transactions.
The hash of the state is the hash of the state plus some transactions, without revealing the transactions nor the states.
An Account on subnet made a valid (holds enough funds and signed) transfer of tokens to an account on subnet , without revealing the balance of .
More formally, a STARK proof is sent by a prover to convince a verifier that it ran a certain computation with some input (and possibly obtained some output ). The STARK system is made of a proving algorithm and a verifying algorithm. While is known to both and , and could be partially or fully kept secret by the prover or shared between the prover and the verifier, depending on the statement to be proven. The entire process takes four steps:
runs with input and records an execution trace . Broadly speaking, the trace is a 2D-matrix recording the value of all the variables of at each execution step. also saves the output if any.
executes the proving algorithm on input (and if any), which returns a STARK proof that has been correctly executed with some input, and returned as an output (if any).
sends (and possibly parts of , , or functions of them, depending on the statement) to .
executes on input the proof and potential inputs/outputs received. It returns true if the proof has been computed from a valid execution of that returns (if any) on input , and otherwise.
STARK systems are known to be doubly scalable, with a prover that is running in time. This allows subnets to prove exponentially large computations and hence improve the overall scalability of the Topos protocol. In addition, such systems are post-quantum secure, as only relying on symmetric primitives like hash functions, unlike their SNARK [184425, Groth16, Plonk] counterparts based on asymmetric primitives.
However, one issue with the above process is that and both depend on . In other words, a distinct pair of proving and verifying algorithms is needed for each specific computation. Not only does it require both participants to potentially store and execute multiple algorithms, but it also forces to write a specific proving algorithm for every new computation, for example for a new smart contract, it wants to prove. would likewise need to make sure to keep its verifying algorithm up to date. For these reasons, we adopted a general-purpose approach: Our STARK system can prove arbitrary computations with a single pair of proving and verifying algorithms that do not need to be updated if the program to prove is modified.
More precisely, the computation which is proven is the Topos zkVM execution itself. consists in a state hash and all the operations happening on-chain modifying this state. is the final state being returned by the Topos zkVM after applying the provided input state transition on the input state. After a proof has been computed, sends it to , along with a hash of the final state . The hash of the previous state can be retrieved from the latest verified certificate of the subnet belongs to. Only providing the state hashes allows sending subnets to keep their state private and improves on scalability by reducing the overhead in data transmitted. The verification function is defined as follows:
The verification function attests to the validity of the state transition claimed by a sending subnet, from a previous state that was committed to (in the form of a hash), to a new committed state. This algorithm can only output true if the prover submitted a valid state transition as part of its (private) input , i.e., corresponding to valid executions of the Topos zkVM.
Since the Topos protocol relies on cryptography for subnets to prove the validity of their state transitions to the rest of the ecosystem, it is crucial to have efficient cryptographic primitives in order to preserve high scalability. STARK proof systems–in particular the ones based on FRI (see A.2), as is the case for the one used in our protocol–have very light requirements (namely to work on a prime field with a -th root of unity for relatively large ) whereas other common SNARK constructions are based, among other things, over algebraic groups, which involve complex mathematical operations that can be hard to implement and optimize. In particular, the prime field involved in STARK proof systems can be much smaller than the usual cryptographic size of 256 or 512 bits.
However, blockchains always require some digital signature scheme to assert the authenticity of propagated messages and, to date, digital signatures based on elliptic curves, such as Schnorr signatures, EdDSA or ECDSA are the preferred ones, due to both their speed and resulting size. The underlying curves commonly used in pair with those schemes are all of large cryptographic sizes, and hence prevent from benefiting fully from the mathematical structure of our proving system.
To address this, and to offer subnets the possibility to exploit the whole power of STARKs, we designed a new elliptic curve, Cheetah [Cheetah], constructed over a sextic extension of a small field with characteristic and tailored for efficiency when proving operations over its group. A detailed security analysis and description of the deterministic process that generated this curve is available at [cryptoeprint:2022:277]. With Cheetah, the Topos zkVM execution can be proven while maintaining a small proof system base field, a crucial consideration for the efficiency of the protocol.
The STARK system at the core of Topos enables the protocol to be:
Trustless: The soundness property of STARKs ensures that it is computationally infeasible for a malicious prover to create a valid proof for an invalid statement. This means that validity of state transitions solely depends on the soundness of the STARK proof included in certificates.
Private: Instead of providing the whole computation that updated their internal state to the verification function, subnets pass only the known hash of their previous state along with the hash of their new state, thus do not reveal anything about transactional data. The computational integrity ensured by the STARK proof system combined with zero-knowledge guarantees that no additional information about the state of subnets is revealed to verifiers; this grants full privacy to subnets.
Scalable: STARKs can prove the computational integrity of a very large number of transactions while keeping the verification cost extremely small.
3.1.4 Topos Subnet
The Topos Subnet is a blockchain network whose main purpose is to handle registration of the ecosystem actors, namely the subnets and the TCE participants, to manage TOPOS, the ecosystem’s native cryptocurrency, and to allow for governance of the protocol through on-chain voting, such that TOPOS token holders will have the ability to participate in future protocol improvements. Subnets register themselves by sending a special transaction which pays a dedicated fee denominated in TOPOS. Furthermore, the Topos Subnet is leveraged for the Sybil resistance of the TCE, requiring participants to lock a TOPOS amount in order to join the system. Finally, it enables the setup of an incentive mechanism for the TCE participants to be rewarded when following the prescribed protocol.
As for the actual implementation, the Topos Subnet is built with the Substrate framework [Substrate] and implements the hybrid BABE/GRANDPA consensus. Block production is conducted by the Ouroboros Praos-based [David2018OuroborosPA] BABE protocol [BABE]. BABE ensures liveness which guarantees that transactions submitted by honest users will eventually be recorded as part of the Topos Subnet state. Moreover, GRANDPA [Grandpa] is leveraged as the finality gadget to ensure deterministic finality guaranteeing that blocks can never be reverted once finalized unlike protocols where finality is only probabilistic. Through the process of nomination and validation, an unbounded number of TOPOS token holders are economically incentivized to participate in the consensus and contribute to the security of the system.
3.1.5 Transmission Control Engine
As seen in Section 3.1.3, the UCI ensures that subnets’ state transitions are valid (guaranteed by the STARK proof) and that the certificates transporting them are authenticated (guaranteed by the ICE-FROST signature). To allow for trustless cross-subnet communication, subnets additionally rely on the Transmission Control Engine (TCE), a network of nodes that receives certificates submitted by subnets to consistently deliver them, i.e., prevent subnets from having conflicting certificates successfully processed.
The TCE implements a permissionless probabilistic protocol of causal reliable broadcast based on [Guerraoui2019TheCN]. The protocol does not involve consensus since consensus enforces total ordering on messages while it is sufficient to have causal ordering for our purposes, i.e., certificates from the same subnet do not commute, while two independent certificates from two different subnets commute. Causal ordering is needed to make sure that the protocol processed all dependencies of a specific certificate as shown in Figure 1. This results in a simpler, more efficient and more robust protocol than consensus-based solutions.
A key role of the TCE is to deal with the situation where a subnet is under attack or is controlled by an adversary, and tries to double-spend. A subnet controlled by an adversary may send the same assets twice to different subnets and , i.e., sends two conflicting -th certificates ( to and to ). In other words, and as shown in Figure 2, for two certificates and , they are said to be conflicting if both and are valid with respect to but the operations associated with the two certificates do not have a legal sequential history. Without a mechanism to prevent conflicting certificates, and would execute messages on-chain from and respectively, in which case would successfully be able to double-spend.
184.108.40.206 System Model
Here we briefly describe the system model for the TCE reliable broadcast. We consider a set of processes. Processes are equipped with private and public keys and identified with the latter. At most a fraction of all processes is Byzantine, i.e., subject to arbitrary failures. We say that a process is correct if it follows the prescribed protocol. Byzantine processes cannot determine which correct processes another correct process is communicating with. Byzantine processes are under the control of the same adversary, and can take coordinated actions.
Processes can communicate with each other using the Probabilistic Reliable Broadcast (PRB) primitive [AT2, Guer]. An instance of probabilistic reliable broadcast exports two events:
.Broadcast(): Used by a process inside the system to broadcast a message ;
.Deliver(): Used by a process inside the system to handle the delivery of a message from sender .
For any , we say that the protocol implementing the reliable broadcast is -secure if the following properties hold:
No duplication: No correct process .Delivers more than one message.
Integrity: If a correct process .Delivers a message , and the sender is correct, then was previously .Broadcast by .
-Validity: If the sender is correct, and .Broadcasts a message , then eventually .Delivers
with probability at least (1 -).
-Consistency: Every correct process that .Delivers a message .Delivers the same message with probability at least .
-Totality: If a correct process .Delivers a message, then every correct process eventually .Delivers a message with probability at least .
220.127.116.11 Probabilistic Reliable Broadcast
For completeness, we recall the probabilistic reliable broadcast solution as presented in [AT2], to underline its advantages and later detail the modifications we apply to it.
The probabilistic reliable broadcast protocol at the heart of the TCE is asynchronous, permissionless and tolerant to Byzantine (arbitrary) failures. It replaces classical quorums by stochastic samples which do not need to intersect and are of much smaller size compared to quorums. The protocol has message complexity per process—thus an overall message complexity of —and the sample size per process is logarithmic in the size of the system which hints at its massive scalability capabilities. The properties of Byzantine reliable broadcast of the TCE can be violated with probability , which can be made arbitrarily small.
The algorithm is composed of three sequential phases of communication exchanges, namely, subscriptions, exchanges, and exchanges. Upon initialization, a correct process randomly samples three sets via a uniform random oracle. The size of each set and the associated threshold are security parameters of the protocol. At the end of the initialization, each correct processes has the following:
An sample and its associated threshold
A sample and its associated threshold
A sample and its associated threshold
During the subscription phase, a process starts exchanging sample-specific subscription messages with processes in the , and sets. Upon receiving sample-specific subscription messages from other processes, a process adds the corresponding message senders in new samples in the following manner:
Senders of subscription messages are added to a new set.
Senders of subscription messages are added to a new set;
These sets determine how processes communicate between them. A process interacts with its sets as follows:
It only listens for messages coming from peers in , and sets;
It only sends messages to peers in and sets.
We note that each set is such that its size is .
Now we proceed with describing the algorithm implementing the probabilistic reliable broadcast. Processes can communicate directly with each other or broadcast messages using a probabilistic (not reliable) broadcast primitive, which might deliver conflicting copies of a message. In the following we refer to it as .Broadcast and .Delivery.
Upon the invocation of .Broadcast for a message , a process .Broadcasts it to all the processes in .
Upon .Delivering a correctly signed message , a correct process sends an message to every process in its subscription set.
A correct process sends a message for (if correctly signed) to all processes in its subscription set if it receives either:
at least messages for from ; or
at least messages for from .
When a correct process receives more than messages for from its delivery sample for the first time, and is correctly signed, it .Delivers .
In the context of the Topos protocol, the probabilistic reliable broadcast alone is not enough. Indeed as mentioned above, the TCE is meant to consistently deliver messages while preserving causal order among them. In the next section we provide a definition of weak causal order and discuss how to modify the PRB solution to extend it with a weak causal order property. Finally we show how the TCE employs Weak Causal Probabilistic Reliable Broadcast to broadcast certificates across the network, and how TCE nodes update their state accordingly.
18.104.22.168 Weak Causal Probabilistic Reliable Broadcast
We now introduce the Weak Causal Probabilistic Reliable Broadcast, which extends the previous broadcast primitive with an additional weak property of causal order among the delivered messages. Intuitively, the weak causal order property imposes that if a correct process delivers a message then is weakly causally ordered with respect to the previously delivered messages.
For completeness, we formally recall the definition of causal precedence [modular94]:
Definition (Causal Precedence). Let a step be the broadcast or the delivery of a message. A given set of steps induces a partial order as follows. Step casually precedes step , denoted , if and only if:
the same process executes both and , in that order; or
is the broadcast of some message and is the delivery of ; or
there is a step , such that and .
We now detail which kind of execution order we need to ensure for the TCE communication. In particular, we show why causal precedence is too strong for our purpose. Intuitively, we are interested in keeping the execution order among the messages .Broadcast by the same process on behalf of a subnet. We also impose the same execution order for any pair of .Delivered and .Broadcast messages by the same process. However, it is not necessary for the system to keep the execution order among messages .Delivered between two .Broadcasts from the same subnet, i.e., certificates in . For instance, as shown in Figure 1, correct processes and can .Deliver and in different orders.
To formalize the above rules, we introduce the weak causal precedence relation, defined as follows:
Definition (Weak Causal Precedence). Let a step be the broadcast or the delivery of a message. A given set of steps induces a partial order as follows. Step weakly casually precedes step , denoted , if and only if:
the same process executes both and , in that order, such that and are not a delivery step; or
is the broadcast of some message and is the delivery of ; or
there is a step , such that and .
An instance of Weak Causal Probabilistic Reliable Broadcast (WCPRB) exposes two events:
: Used by a process inside the system to broadcast a message ;
: Used by a process inside the system to deliver a message .
For any we say that the protocol implementing the WCPRB is -secure if the following properties hold: No duplication, Integrity, -Validity, -Consistency, -Totality as defined for the probabilistic reliable broadcast, and additionally:
Weak causal order: If the .Broadcast of a message weakly causally precedes the .Broadcast of a message , no correct process .Delivers unless it has previously delivered .
The pseudo-code in Algorithm 4 specifies a solution for WCPRB. This solution employs the probabilistic reliable broadcast primitive as described in the previous section and a Valid predicate specific to the Topos system.
Before introducing the algorithm, we define the Valid predicate.
In the PRB solution, a message must be properly signed to be delivered. Notice that, in the Topos system, TCE processes broadcast certificates on behalf of subnets. Certificate production and submission is left to the discretion of the sending subnet. For that reason, we are not interested in identifying the process but the subnet that originated this certificate; this information is provided inside the certificate itself by means of the field. Moreover, each certificate is .Delivered if the casually dependent certificates have also been .Delivered.
The message to be delivered must satisfy validity conditions. Specifically, the Valid predicate (see Algorithm 3) is the conjunction of the two deterministic predicates Valid_cert and Valid_deps:
The certificate validation predicate Valid_cert (see Algorithm 1) must return true;
Any preceding certificate that a subnet issued must have been validated (implied by the linkage of certificates and encompassed in Valid_cert);
The reported dependencies of the certificate must have been validated and must exist in the histories of all subnets of the dependencies, i.e., the Valid_deps predicate must output true.
The Valid_deps predicate, as defined in Algorithm 2, returns true if for each certificate submitted by a subnet in , the certificate is in . Note that the predicate is monotonic because, for a certificate message , if at time then . The combination of the -Consistency, the -Totality—both providing the agreement property—and weak causal order (enforced by the Valid_deps predicate) properties of the WCPRB defines extrinsic validity: All correct TCE nodes are guaranteed to deliver the same weakly causally ordered sets of certificates, i.e., no subnet can successfully submit conflicting certificates in order to double-spend.
When participating in the WCPRB, each TCE process locally holds the following variables:
: The local set of accepted incoming and outgoing certificates involving subnet , for all (initialized and modified by Algorithm 5);
: The local set of incoming certificates involving a subnet since its last outgoing certificate (initialized and modified by Algorithm 5);
: The local set of certificates pending for validation.
The WCPRB protocol works as follows. When a TCE process wants to .Broadcast a message it verifies that holds before calling .Broadcast(). Upon the .Delivery() event, a TCE process does not trigger the .Delivery of that message yet, but adds it to the set. Intuitively, Valid being a stateful predicate, it can happen that does not satisfy the predicate at the current time but it will after the .Delivery of other messages. Hence, new incoming messages are kept in the variable. As soon as there exists a message in such that outputs true, is removed from and delivered. An interested reader can find a sketch of the proof of correctness of the proposed solution in Appendix C.
22.214.171.124 Certificate Submission and State Update
We detail how to transmit certificates in the Topos ecosystem using WCPRB. Algorithm 5 describes the submission of a certificate and its application to the local state of TCE processes.
Each TCE process locally holds the following variables:
To submit a certificate , a subnet .Broadcasts an ICE-FROST signed message . When a correct TCE process .Delivers a certificate, the TCE node applies the certificate to its local state. Applying a certificate means that the TCE node adds the certificate and its dependencies to the history of subnet . More
precisely, upon the .Delivery event for a message from , a correct process always updates but updates only if belongs to a subnet addressed by the cross-subnet message. Notice that, if does not belong to any subnet () then never updates .
The state of the TCE (see Equation 2) is defined as the union of all the sets of the TCE participants. In other words, and as shown in the equation below, it is the set of all certificates that have been validated, .Delivered, and applied. The state of each TCE node is local and converges with probability .
Overall, the TCE adds multiple key properties to the Topos protocol:
Security: The TCE enforces a weak causal ordering of certificates under asynchrony [10.1145/167088.167105] and is a more robust primitive than consensus and atomic broadcast since both of them are impossible to solve in the asynchronous model even with one crash failure [10.1145/3149.214121].
Scalability and Decentralization: With a per-node communication and computation complexity logarithmic in the size of the system, the WCPRB protocol can sustain a very high number of TCE participants—which increases decentralization—while preserving high throughput.
Discussion. For clarity of presentation, we used the PRB protocol as a black box. However, the solution defined in Section 126.96.36.199 would not prevent the .Delivery of ill-formed messages (i.e., messages such that ). Notice that this is not a problem, as those messages are not delivered by the WCPRB solution defined in Algorithm 4. Indeed, if then (see Algorithm 3).
However, to prevent TCE processes from .Delivering messages that will never be .Delivered, we can move the Valid_cert check to the reception of new messages during the PRB protocol (see Section 188.8.131.52). That is, at the reception of a message , each correct process checks that the message is correctly signed and that the certificate carried by is well-formed () before processing it, or discards it otherwise.
184.108.40.206 Sybil Resistance
The TCE establishes an agreement on a set of operations among processes with equal weight. Processes are equal due to the fact that they must construct samples of peers selected uniformly at random. The TCE tolerates up to a threshold of Byzantine processes in the system. The Sybil attack [10.1007/3-540-45748-8_24], the capability of an adversary to freely create identities to overcome the threshold, is the main threat that the system has to tolerate. Sybil resistance is relatively easy to achieve in a permissioned system contrary to permissionless systems where membership is free. The TCE is a permissionless system and as such it is crucial to enforce that the number of Byzantine processes remain below the threshold. Notable approaches are the Proof of Work (the adversary cannot have more computational power for free) and Proof of Stake (the adversary cannot hold more assets for free). The Topos approach follows the latter, which implies the management of an asset. So in order to defend itself against Sybil attacks, the TCE leverages the Topos Subnet to ensure that a majority of reliable broadcast participants follows the protocol such that it is not possible to inconsistently deliver cross-subnet messages, i.e., double-spend.
Processes wishing to participate in the TCE must submit a special transaction on the Topos Subnet which records their intention to join the TCE. This transaction includes a fixed amount, denominated in TOPOS, as well as an identifier of the participant and it locks the TOPOS amount on the Topos Subnet. Non-free registration of participants provides the basis for a Sybil resistance mechanism in the TCE: Participants communicate only with peers registered on the Topos Subnet.
3.2 Protocol Overview
To this day, interoperability protocols have fallen into disjointed categories depending on their design goals. Trusted interoperability protocols (e.g., [Wormhole, AvalancheBridge, PolygonPoSBridge]) have relied on external verifiers—administrators of centralized protocols or incentivized relayers in decentralized protocols—to bridge the interoperated chains, allowing for cross-chain message passing at the cost of trust in verifying entities that are external to the blockchains implementing the protocol (depending on an auxiliary, often much weaker, cryptoeconomic security). On the other hand, trustless protocols have found solutions to remove the need for trust in third parties: Some protocols (e.g., [LayerZero]) have centered their design on the multiplication of non-colluding verifying networks, while others (e.g., [Cosmos]) have removed the need for external verification by solely depending on the chain’s own actors to natively verify data cross the chains. These models, although on the right path towards trustless interoperability, still impose trust in the consensus security of the interoperated chains and as such do not permit true trustlessness.
Topos innovates by introducing the very first solution that cryptographically enforces validity of cross-subnet messages without the need for trust in any external verifiers nor consensus security. At its core, Topos allows subnets to exchange messages with each other trustlessly and safely. Uniform security (see Section 3.2.5) is a key innovation in the cross-chain interoperability landscape and will pave the way for a new era of secure communication between decoupled blockchains.
Generally speaking, the Topos protocol is built on three major pillars enabled by the components exposed in the previous section.
The TCE reliable broadcast protocol allows for consistent delivery of causally ordered subnet certificates.
Certificates include a zkSTARK validity proof of the committed state transition; thus every node in the TCE network can attest to the validity of cross-subnet messages without the need to trust the sending subnet.
Certificates are authenticated by means of an ICE-FROST signature; receiving subnets can thereby be ensured that delivered certificates were not tampered with.
3.2.1 Cross-Subnet Message
A cross-subnet message represents a request initiated by a user from a subnet to execute a transaction in a remote subnet. It takes the form of a function call of a dedicated protocol-level smart contract, namely the Topos Core contract, on the sending subnet and is to be interpreted on the receiving subnet as another function to call. The Topos Core contract function to call on the sending subnet depends on the type of message requested:
Asset transfer: An asset is burnt/locked on the sending subnet and equivalently minted on the receiving one.
transferAsset( subnet_id: Identifier of the receiving subnet, asset_id: Identifier of the transferred asset, recipient_addr: Recipient’s address on the receiving subnet, amount: Amount to be transferred )
Arbitrary contract call: A contract on the receiving subnet is called from the sending subnet.
callArbitraryContract( subnet_id: Identifier of the receiving subnet, contract_addr: Address of the smart contract, func_name: Name of the function to call, func_args: Arguments to pass to the function call )
Topos enables interoperability of subnets via the following transmission flow of cross-subnet messages (see Figure 3). Once a new cross-subnet message emitted by a user is part of the canonical chain of the subnet, it becomes ready for certification as per the rules of the UCI: it is batched with an arbitrary amount of transactions to form a new state transition whose validity is to be proven in a new authenticated certificate. Once created, the message is delivered throughout the TCE network via the reliable broadcast primitive and eventually collected by the subnet it is addressed to. Thanks to the validity and authentication properties guaranteed by the UCI and the consistent delivery ensured by the TCE, the receiving subnet can trustlessly and securely interpret the cross-subnet message and execute the request transaction locally.
3.2.2 Scalability and Decentralization
In order to verify the zkSTARK proof included in a certificate, TCE nodes do not need access to the subnet state. The zkSTARK verifier only accesses the hash of the state committed to in the previous certificate of the same subnet, and the hash of the new state committed to in the new certificate. This means that even though the size of the subnets’ state transitions can be extremely large, the verification is nearly-optimal and the storage requirement for TCE nodes is kept very low. Keeping the state of the TCE small is paramount to ensure that new joining nodes can synchronize quickly and can keep the burden of storing the certificates low, even with the system processing a large amount of cross-subnet messages. While the size of the TCE state grows linearly with the number of certificates, the overhead of storing new certificates remains acceptable.
Another essential advantage of having stateless verification resides in the fact that since the cost of storing the state of the TCE is kept low, it is possible for actors with low-cost hardware to participate in the TCE, thus increasing the decentralization of the TCE.
As detailed earlier in the paper, composability is a design principle that is found when various applications can compose their value by invoking each others’ functions. In the Topos ecosystem, composability is ensured in two different manners.
220.127.116.11 Atomic Composability
Within a single subnet network, developers can deploy smart contracts that invoke other contracts synchronously, i.e., comprise contract-to-contract calls that are executed one after the other and only if the previous operation was successfully completed (see Figure 4). If a single operation fails, the whole transaction is reverted. In this context, composability is described as atomic for either all operations or none are executed. Commonly found in traditional databases, atomic composability allows subnets to safely transition their state and prevents them from facing corrupted state introduced by composed contract calls that fail in the middle their execution.
18.104.22.168 Asynchronous Composability
In addition to atomic intra-subnet composability, the Topos protocol permits inter-subnet asynchronous composability, i.e., the capability of different applications deployed on multiple subnets to invoke each other. As we have seen previously, users of a sending subnet can emit cross-subnet asset transfers or remotely invoke arbitrary smart contracts from different subnets by calling functions of the Topos Core contract. To obtain composability across subnets, developers can atomically compose their smart contracts with the Topos Core contract, i.e., programmatically execute cross-subnet asset transfers or remote contract calls as part of their own smart contract functions. Then, subnets can include calls to these composed smart contracts in certificates for receiving subnets to learn about these new types of cross-subnet messages (see Figure 5).
Asynchronous composability is enabled in the Topos ecosystem by the UCI and the TCE, and is provided to any applications deployed on any subnets.
3.2.4 Incentive System Design
Topos will create economic incentives for the various actors of its ecosystem by means of its native token, TOPOS.
In blockchain systems, some actors are selfish and want to take advantage of the system. These actors deviate from the protocol, when such deviation will lead to more individual gain. As an example, they deviate to earn more rewards than when following the prescribed protocol. These actors are called rationals. The Topos ecosystem aims at being tolerant to Byzantine faults when the participants are rational.
In the following sections, we briefly discuss incentive system designs that will be implemented in Topos that will help align the rational actors behaviors to the prescribed behaviors.
22.214.171.124 TCE Incentives
Topos needs to prevent the verifier’s dilemma [10.1145/2810103.2813659] in which, instead of maintaining the common good, correct processes choose not to verify certificates because their verification is more computationally expensive than verifying transactions, or doing anything at all. Even if requiring the reliable broadcast participant to stake an amount of TOPOS is sufficient to provide guarantees against Sybil attacks, it is not sufficient on its own to incentivize TCE participants to follow the prescribed protocol. In fact, the system can still be subject to many predicaments, such as the verifier’s dilemma. Thus, to ensure proper verification and execution of certificates, TCE participants who correctly followed the protocol should be rewarded accordingly. Such rewards come from applying a special fee to cross-subnet messages. Fees associated to cross-subnet messages are denominated in TOPOS and are collected by TCE participants with respect to their work. Without such economic incentives, undesired situations can happen. For example, no one verifies certificates and as such invalid certificates can be spread in the system, or certificates could stay in for a long amount of time, increasing the end-to-end cross-subnet communication latency. Topos’s incentive model will give guarantees that all certificates are processed since they potentially contain a large number of cross-subnet messages. To ensure the performance of the system, a desirable objective of the incentive model is to process the certificates as quickly as possible. Verifying and executing certificates should be an interesting and lucrative activity incentivizing rational TCE participants to behave well, i.e., the incentive model should not reward participants who do not contribute to the system. This is ensured by the proof-of-activity mechanism introduced below.
Since the communication in the TCE is not synchronous, assessing whether a node does not participate is generally impossible for one cannot distinguish if a node is slow or if it is not working. The alternative approach is to prove that a node was active. Notice that this approach cannot, however, account for slow nodes. A proof-of-activity for a node or a set of nodes proves that their work has been seen and was considered by sufficiently many TCE participants. Since messages are signed, the proof-of-activity can consist of the set of messages delivered by a node. However, exchanging such sets of messages will induce a high communication overhead and will require too much storage on the Topos subnet; therefore, to be practical, the proof-of-activity will rely on aggregation techniques. The technical details are deferred to a subsequent paper.
126.96.36.199 Cross-Subnet Fee
Cross-subnet fees must also be paid in order for the requested transaction to be executed on the receiving subnet by its validators. To estimate these fees, the transaction originator (a participant of the sending subnet) can ask a service to estimate the fee required to execute the given transaction. The service estimates and returns the result based on the fee calculation of the receiving subnet. One possibility is for this service to be exposed by a system of decentralized oracles like Chainlink[Chainlink]. Internally, the estimation provider will estimate the resources consumed by processing the transaction requested in the cross-subnet message based on the current receiving subnet state.
As a general observation, if the cross-subnet fee was not greater than the one of regular transactions, the reward collected by the validator of the receiving subnet would be smaller (because shared with the TCE participants), hence it would not be in the subnet validators’ best interest to process certificates in the first place.
3.2.5 Uniform Security
As we have seen throughout the paper, the Topos protocol allows for trustless interoperability. A sending subnet is responsible for proving that transactions are valid executions of the Topos zkVM, signing certificates that include cross-subnet messages, and sending the certificates to the TCE for broadcast in the ecosystem, while a receiving subnet is responsible for correctly applying the cross-subnet messages, i.e., submitting the requested transactions in their network.
Topos’s uniform security is realized by the combined properties of the UCI and the TCE:
Intrinsic validity of certificates, ensured by the UCI’s Valid_cert predicate;
Extrinsic validity of certificate messages, ensured by the WCPRB’s agreement property and Valid_deps predicate implemented by the TCE.
The computational integrity of subnets’ state transitions is fully decoupled from the consensus security of the related subnets and is entirely ensured by zkSTARK proofs. It is computationally infeasible for an adversary to forge a proof of validity, i.e., convince a verifier that an invalid transaction was correctly executed by the Topos zkVM. In this setup, receiving subnets benefit from an unparalleled level of security for they are assured that certified state transitions are valid. Put another way, if the UCI certificate validation predicate Valid_cert outputs true, cross-subnet messages are guaranteed to be intrinsically valid.
The certificate messages that are delivered by TCE nodes are guaranteed not to be conflicting with each other while they form weakly causally ordered sets which capture the weak causal precedence between messages. This is achieved if the TCE’s Valid_deps predicate outputs true, which triggers the .Delivery when intrinsic validity is also verified. It follows that it is infeasible for malicious subnets to successfully double-spend and deceive honest receiving subnets into executing conflicting cross-subnet messages.
As soon as a certificate is delivered, intrinsic validity and extrinsic validity are enforced by the Topos protocol. This provides the ecosystem with uniform security. The safety of executing cross-subnet messages internally on receiving subnets is independent of the security of sending subnets. The trust in cryptographic primitives in lieu of bridge verifiers and/or blockchain consensus as to proving the validity of the message is a fundamental innovation in the field of blockchain interoperability. It is infeasible to create a certificate containing an invalid state transition and to create conflicting certificates in order to double-spend across the ecosystem, even in the presence of a malicious subnet, e.g., more than 2/3 of its validators are controlled by an adversary. Note that transaction censorship remains an issue which needs to be addressed by subnets directly.
Secure Cross-Subnet Asset Transfer.
In the practical case of a cross-subnet asset transfer, a sending subnet submits a certificate containing a proof of the validity of the asset transfer transaction. This transaction, after checking that the balance of the sender was sufficient to allow the transfer, proceeded with locking/burning the assets to be transferred. Once delivered and verified, the certificate gives total insurance to the receiving subnet that the balance check and the lock/burn operations were conducted with success on the sending subnet. This demonstrates that no malicious entity can simulate that some assets were locked/burnt in the context of a cross-subnet asset transfer, preventing them from arbitrarily minting tokens on the receiving subnet.
An additional innovation brought by the Topos protocol is in that having uniform security allows for the first trustless burn-mint asset transfer model: Receiving subnets do not need to trust sending subnets to have correctly burnt assets before minting. More secure than the lock-mint model, burning assets on sending subnets ensures that no authority can steal user assets sent across multiple chains. By enabling a trustless burn-mint asset transfer model, Topos paves the way for a new kind of asset bridge paradigm where tokens can be natively and frictionlessly deployed and managed on any blockchains.
3.2.6 Finality and Reorganization
With certificates being stored on the TCE, it is guaranteed that the submitted commitments to the state of subnets are immutable and hence that subnets’ states cannot be reverted to states prior to the ones committed to in their latest delivered certificates without creating conflicts. In the event that a subnet reorgs to such state, the TCE would prevent the delivery of new certificates committing to a new state as these certificates would conflict with the latest delivered certificate. This is guaranteed by the monotonicity of the TCE message predicate: If for any message , is true at time then it remains true at any time . Once a certificate message is delivered via the WCPRB primitive, the certificate is considered to be final.
4 Use Cases
The Topos protocol can suit various different use cases. Below are outlined three use cases which hint at the vast capabilities that Topos is capable of.
4.1 Subnets as Layer-2 to Interoperate Layer-1
As we have seen previously, in the Topos ecosystem subnets are general-purpose blockchains: They can host any type of application and prove their state transitions using the Topos zkVM, a virtual machine that allows for the execution of arbitrary provable computation. One typical use case that we envision is that of layer-2 (L2) protocols that scale layer-1 (L1) blockchain networks by delegating the execution of transactions to an offchain network and by relying on validity proofs to update the L2 state view on the L1, i.e., zk-rollups [Rollup]. L2 protocols separate the execution and settlement layers. Traditionally, the two layers are combined: Transactions (execution layer) are executed locally by a participating node when importing a block that has been validated by the network consensus rules (settlement layer). The Topos ecosystem enables the concept of layered scalability (see Figure 6) where participating subnets are execution layers which delegate state settlement to external L1 networks. One may notice that L1 networks can greatly benefit from the instantiation of new zk-rollup subnets for they can delegate part of their execution layer and hence better scale by settling many more transactions per second. Subnets are L2 zk-rollups scaling existing secure L1s (e.g., Ethereum [Ethereum], Avalanche [Avalanche]). In this configuration, settlement happens on the L1 networks where subnets publish their proofs of valid state transition.
A key consequence is that Topos indirectly enables interoperability between decoupled L1 blockchains: Any L1 chain that hosts bridge smart contracts bridging assets with a Topos subnet is de facto compatible with all other subnets in the Topos ecosystem, some of which are to be zk-rollups of other L1 networks and as such are capable to route cross-subnet messages back to their assigned L1 chain.
By design, L2 networks offer much cheaper transaction fees than L1 blockchains for they are parallel networks that do not face similar levels of congestion (if they were, the protocol operators could simply spawn a new instance of the protocol and delegate part of their execution workload to it). For that reason, dApps have been making the move to L2 and so have been their L1 users. By moving to Topos subnets, L1 users might enjoy lower fees and additionally gain for free interoperability with all other subnets in the ecosystem and in fine with all the other bridged L1 networks. The Topos protocol offers interoperability with scalability for free for existing L1s.
Practically, the data flow is the following (see Figure 7):
To send funds to an L2 subnet, users deposit assets into a bridge contract on the underlying L1 blockchain. An equivalent number of wrapped assets is then minted on the L2 subnet for the user once the lock transaction is processed (classic lock-mint model found in most bridge protocols).
Users can initiate (wrapped) asset transfers to any other subnet in the ecosystem. They can also use their wrapped assets to pay fees when submitting requests for general-purpose computation execution, either locally on their subnet, or remotely on any other subnet (remote arbitrary transaction).
A large number of subnet transactions are batched and proven by a zkSTARK proof which is transmitted to the L1 blockchain for L2 state settlement and to the rest of the Topos ecosystem via Topos certificates.
The TCE reliably broadcasts certificates throughout the ecosystem. Once certificates are delivered to all TCE nodes, the validators of receiving subnets can process the included requests for remote transaction execution and submit requested transactions locally in their subnet network.
4.2 Decentralized Finance (DeFi)
In the past few years, DeFi, one of the predominant use cases of web3 technologies today, has attracted a previously unseen level of user activity and commitment, with global TVL (Total Value Locked) reaching amounts in the hundreds of billions of USD. The problem is that the locked value remains isolated in silos, inaccessible for user-friendly and secure exchange options are still lacking. In most cases, liquidity, staking, loans, and other forms of DeFi value remain locked off in individual protocols, i.e., smart contracts, on individual blockchains. Due to this condition, crypto assets often lay idle in users’ wallets as there is limited scope for their use within individual blockchains. And in turn, DeFi is unable to realize its full potential.
To unlock the true potential of decentralized finance and capture massive value it is essential to allow for seamless composability of platforms and applications across different blockchain ecosystems, so that assets can move frictionlessly across different blockchain networks and be used in any DeFi protocols on any blockchains. Using trustless bridges is the best fit for DeFi because they are non-custodial, and from a security viewpoint they only leverage the security of the interoperated blockchains. Because of Topos’ innovations, which give superior security guarantees than other so-called trustless interoperability protocols, value can move freely and securely from one DeFi protocol to another. The properties provided by Topos greatly enhance DeFi’s potential to establish itself as a ubiquitous substitution for traditional financial systems.
4.3 Enterprise Adoption
The two major problems preventing enterprise adoption of blockchain technologies are the lack of interoperability and the lack of privacy [ey-public:2019].
Lack of interoperability.
Companies and organizations build value by assembling proprietary data and by controlling which part of it they expose in their products and services. Enterprises often prosper by composing value with other companies and to that end have traditionally, in the Web1-2 era, used ubiquitous infrastructure and technologies such as authenticated RESTful APIs to interface their products and services with their partners’ without friction nor extra cost. Unfortunately, Web3 technologies have yet to create such standards. For enterprises to use Web3 and blockchain technologies, there is a need for a frictionless interoperability model in which blockchains running proprietary business logic and storing proprietary data can exchange and capture value openly and without the requirement to expose their private and hidden information.
Lack of privacy.
Privacy is another challenge of blockchain technologies. One of the greatest strengths of the technology is the transparency that comes from having a distributed record of transaction history that is public and easy to verify. Yet, it poses a threat to the privacy of organisations and users. Enterprises which want to protect their trade secrets and other sensitive information are therefore reluctant to embrace even the most prominent permissionless blockchain protocols and hence have favored the private/permissioned blockchain architecture.
Topos allows an unbounded heterogeneous set of public and private blockchains to interoperate with each other while preserving the privacy of their internal state. As such, Topos solves for both blockchain interoperability and privacy, and hence is the springboard for adoption of the technology by enterprises.
In this paper, we introduced the design of Topos, the first interoperability protocol that is truly trustless and decentralized while preserving the privacy of the participating networks. Topos allows cross-subnet messages to be safely exchanged without the need to rely on third parties to guarantee the validity of the cross-subnet messages. Instead, we showed that this trust can be replaced by proofs of computational integrity, providing cryptographic guarantees. Another benefit of using such proofs is that verification is extremely fast and barely grows with the size of the computation which grants great throughput increase capabilities. Furthermore, a novel threshold signature scheme is introduced to facilitate authentication of messages across the Topos ecosystem. Such primitive makes it practical to manage public keys as, unlike with other threshold signature schemes, the public key remains static for the whole lifespan of subnets. In addition, we introduced a probabilistic reliable broadcast primitive to ensure consistent delivery of cross-subnet messages even in the presence of Byzantine actors. The primitive replaces the classical quorums in favor of stochastic samples while keeping the per-node communication and computation overhead very small even when the network size increases. This leads to a massively scalable and high throughput protocol.
Topos is a secure, trustless, and decentralized interoperability protocol with the aim to realize the vision of an “Internet of Blockchains”.
6 Discussion and Future Work
6.1 Subnet Recovery
Subnets are sovereign networks which can be exposed to network instability, or worse, to attacks which can threaten their integrity. An example of attack is one leading a supermajority of consensus participants to be controlled by an adversary. In this situation, an arbitrary state can be finalized as part of the canonical chain. Once recovered from the attack, the subnet can start back from a valid pre-attack state only if it has access to this state. Note that on the Topos ecosystem level, the subnet can continue to participate only if there was no certificate emitted during the attack, otherwise even in the case of a recovery, the subnet would never be able to emit a certificate which is valid with respect to the subnet’s history stored on the TCE.
One valuable extension of the Topos protocol can be found in leveraging data availability on the TCE to offer a recovery feature for any subnet in the ecosystem. Subnets that were compromised can query the TCE for a proof of availability, retrieve their latest committed state, and reboot from this state.
TCE nodes store certificates which from subnets’ transactional data expose only cross-subnet messages. For a data availability layer on the TCE to allow for subnet recovery, TCE nodes need to store subnets’ states in clear (preventing subnets to use Topos’s privacy feature). This leads to having additional data be stored, hence impacts the decentralization of the TCE network. A naive solution would be to leverage a distributed storage network, e.g., IPFS [Benet2014IPFSC], but it is not sufficient to simply store the state on it: There is need for a proof guaranteeing high availability of the storage layer and that the state is actually available. However, the successive states do not need to be available at all time for it is unnecessary to keep past data stored on that layer. Only the commitment to the state needs to be stored permanently on the data availability layer for new participants of the TCE to be able to verify previous zkSTARKs. Thereby, past states can be completely discarded, only the current state needs to be stored on the data availability layer.
Eventually, subnets can choose to include their states in certificates—giving up privacy—or not [Validium] and TCE nodes only store the latest state and remove previous ones as new ones are submitted, keeping storage per subnet constant.
An alternative would be to run a data availability service on a given subnet, e.g., the Topos Subnet, with an economic incentive to store data, such that the service is well decentralized, hence minimizing trust in the data availability layer, in the sense that if the Topos Subnet is available so is the state of other subnets. A drawback of this approach is that past states cannot be discarded since they are stored on-chain.
6.2 Confidentiality of Cross-Subnet Transactions
Since subnets in the Topos ecosystem are sovereign blockchain networks, they can choose their level of confidentiality for internal transactions. However, for interoperability transactions between subnets must follow specific rules. In Topos’s current design, the protocol specifies proofs of validity of these transactions and authenticity of the certificates they are included in but cross-subnet messages are visible in clear in the certificates. A path for future improvement is to provide confidentiality for cross-subnet messages without losing existing properties. There are a few approaches that require substantial research to achieve this goal. One such approach is to use zkSTARKs for hiding message data.
6.3 Recursive STARKs
As detailed in this paper, the Topos protocol introduces the UCI for subnets to speak a common language in order to, in fine, exchange data (via certificates) that is compliant with the protocol. In the current form of the protocol, certificates are created when subnets fill their batch of transactions and create a zkSTARK proving the validity of all transactions of the batch. Consequently, a subnet in which participants exchange a lot of internal transactions but submit very few cross-subnet messages will lead to the creation and propagation in the TCE of certificates that have little value in terms of interoperability in the ecosystem.
In order to relieve subnets and the TCE from this unnecessary workload, a future improvement of the Topos protocol is to leverage recursive zkSTARKs to compose proofs that are pending for insertion in certificates and ultimately create certificates only when subnets decide they have enough value in certifying some state transition (see Figure 8). Subnets will batch and prove sets of transactions as currently designed, but will accumulate the proofs and recursively compose them in a single proof for an upcoming certificate. Eventually, this will introduce flexibility and allow subnets to devise their own lineup of certificates.
6.4 Alternative to Topos Subnet
The current Topos solution relies on the Topos Subnet for the following key operations: (i) subnets registration to the system; (ii) management of the participation in the TCE (for Sybil Attack mitigation); (iii) TOPOS asset management. Even if the Topos system security does not fully lean on the Topos Subnet, its failure would incur potential vulnerabilities in the ecosystem. For that reason we aim in the future at implementing a solution that does not depend on any dedicated subnet. This means that the three key operations described above would have to be implemented at the TCE level. Intuitively, all the information totally ordered in the Topos Subnet would have to be replicated at the TCE level, and in absence of consensus this information would appear in different order to different TCE nodes.
However, preliminary research shows that partial ordering of sets of messages is sufficient to implement all three operations. Indeed, all these operations are dependent on a token and we know that causal reliable broadcast (that the TCE implements) is sufficient to implement a cryptocurrency [AT2], even in the case where the reliable broadcast protocol implements an inflation mechanism [https://doi.org/10.48550/arxiv.2105.04966].
We would like to thank Sara Tucci-Piergiovanni and Thibault Rieutord at the CEA List and Robin Salen, Travis Baumbaugh, Alonso Gonzalez, and Hamy Ratonanina, our colleagues at Toposware for their helpful feedback and reviews which we have incorporated into the current version of the paper.
Appendix A STARK Proof System
The only assumption required by STARK proof systems is that the hash functions to be used for commitments are collision-resistant. This allows for simpler, leaner, post-quantum and trustless proving systems than other SNARK systems.
The STARK proof construction can be decomposed into four stages.
Algebraic Intermediate Representation (AIR). First there is need for an algebraic representation of the problem. Consider the set of the multivariate polynomials in variables and , , where and
represent the states of the current and next computation respectively. That is, for two correct vectors, we have that is a correct solution for the system , i.e. . For efficiency of both prover and verifier, we need minimal AIR, which is minimizing:
, the state-width;
, the machine cycle count (that may depend in general on arbitrary input, but is here linear in the size of the input set).
To be able to extend and commit to the trace efficiently, the number of steps is increased so that the number of rows reaches the next power of 2, , for efficient FFTs.
We often talk of execution trace for a program, which can be seen as a matrix in which:
each row is describing the state of the computation at a given step;
each column tracks a the content of a register over time.
Extension of the trace and commitment. We can view any column of the execution trace as a polynomial over a certain domain (generated by ). We can then consider the same polynomial over the domain , where is a root of . This is referred to as the Low Degree Extension [10.1145/103418.103428]. The evaluation of a column polynomial on makes a code word of a Reed-Solomon code [doi:10.1137/0108018] of some rate ( with the ratio between the original trace domain and the augmented LDE domain). That is, . To prevent forging of proofs, it is important that the prover cannot change these values later on. Rather than sending all of these points (since we want succinctness and zero knowledge), the prover creates a commitment
to the values with a Merkle tree structure (the leaves being a grouping of evaluations of all the polynomials at a given LDE point). The commitment, along with the public inputs as part of the AIR program, is used to seed a public coin to allow drawing of random values to make the protocol non-interactive with the Fiat-Shamir heuristic.
If RAPs (see below) are being used, auxiliary trace segments can be computed after the previous trace portion has been committed to.
Constraint composition polynomial and consistency proof. Similarly, the constraint polynomials may be composed with the column polynomials ( for ) and evaluated at the points of the LDE. However, instead of creating separate evaluations for each constraint, random coins are drawn and used to create a random linear combination of the constraint polynomials. In this combination, the degrees of all constraint polynomials are augmented to all be (the next power of following the maximum degree). Due to its higher degree, more coefficients are needed to specify it. These are arranged in several columns (), which are committed to analogously to the trace polynomials. This commitment is once again used to seed the randomness, and from this a random out-of-domain point is sampled. The prover then provides the values necessary to evaluate the constraint composition polynomial in two ways: directly through the column values committed to, and indirectly by evaluating the constraint polynomials at the corresponding points of the trace polynomials and performing the same linear combination as before. These values are added to the proof and the randomization of the public coin.
Fast RS Interactive Oracle Proof of Proximity (FRI). The underlying idea of FRI [BenSasson2017FastRI]
, is to apply a similar degree-reduction to what’s happening during the Inverse Fast Fourier Transform (splitting a polynomial in two instances over even and odd powers of a variable), and bind prover’s responses in these reductions by evaluations of the functionover points of the subset . More formally, if we can represent as (s.t. generates a multiplicative group of order ) and the function to be the function known by the prover of degree . The verifier will sample a random , and ask the prover to compute , where will have degree for any chosen by the verifier. Here, are two polynomial functions such that . That is, they are functions with interpolants which are used to compute the original interpolant of . If is -far from , then the resulting will be -far for some . This process is repeated for a number of layers until the polynomial is reached which should be constant (or of low enough degree that it can be checked directly in constant time).
If the original was far from any polynomial in , then (with high probability) is not constant. This property of FRI is used to succinctly prove that a certain polynomial is of low degree. That polynomial is known as the DEEP composition polynomial (from Domain Extension for Eliminating Pretenders). It is a polynomial constructed to be of low degree only if the values previously supplied by the prover (for evaluation of the constraint composition polynomial) are consistent with the polynomials previously committed to. Due to its structure, low-degreeness of the DEEP composition polynomial also implies that the trace polynomials and column polynomials for the constraint composition are of suitably low degree.
To verify a proof given by a prover, the verifier must perform the following steps.
Read the commitment to the execution trace over the LDE domain, updating the public coin and drawing from it random coefficients used by the prover to compute the composition polynomial.
If RAPs (see below) are being used, intermediate random coins (for use in permutation arguments) are drawn after the previous columns have been committed to.
Read the commitment to the constraint composition polynomial evaluations (over the LDE domain), use that to update the public coin, and sample the out-of-domain point .
Evaluate the constraints at the provided out-of-domain point based on prover-supplied trace values. Compute the evaluations of the constraint composition polynomial at the same point from the column values. Check for consistency between values. Reseeding is done after each read.
Perform the FRI protocol: Draw coefficients for computing the DEEP composition polynomial and instantiate a FRI verifier for the layer commitments provided in the channel. Draw query positions for the LDE domain, read the evaluations of the trace and constraint polynomials at those positions. Use those to compute evaluations of the DEEP composition polynomial and verify that these are from a low-degree polynomial.
a.4 Randomized Air with Preprocessing (RAPs)
An additional feature desired for efficiency in STARKS is known as Randomized Air with Preprocessing (RAP). With RAPs, additional columns of the trace are committed to with access to random coins based on the original columns. This allows use of the Schwarz-Zippel lemma to show that, for instance, two columns are permutations of each other. This can be done by checking
where and are the -th entry in each column, and is randomly chosen after those values have been comitted to. With high probability, this only holds if the two sets and are permutations of each other [Plonk]. By supplying a known permutation to the verifier, they can run a check that
(where is randomly chosen along with ), which indicates that for all . This is useful for enforcing equality over great distances in the execution trace, and is referred to as copy constraints with RAPs.
Appendix B ICE-FROST Signature
The ICE-FROST protocol [cryptoeprint:2021:1658] is our own adaptation of the FROST protocol [frost]. The goal is to allow a subnet to generate signatures with a t-out-of-n threshold in a decentralized environment without any single trusted or semi-trusted party, and in the potential presence of malicious actors. Compared to the original FROST, our construction makes the key generation robust: enough honest actors can agree on the group’s public key even in presence of malicious parties and without any rerun. In addition, honest actors can reliably identify misbehaving participants and exclude them from the scheme. At any point of time, honest actors maintain the same list of honest participants and can ignore any message from other parties.
For completeness, below is the detailed summary of the protocol.
is a group of prime order in which the DDH problem is hard. is a generator of that group.
The threshold and the participants are chosen by the subnet.
Each participant has or receives a unique id.
Each participant has access to a function. Each message published using is automatically signed and available to everyone.
Each participant is given an index between 1 and . For simplicity we assume that the participants receive indices to but we only need them to be unique and non-zero.
Honest participants want to sign a message agreed upon externally to the scheme.
b.2 Key Generation phase
Let be a hash function whose output is in .
Let and be symmetric encryption and decryption functions.
Let be a key derivation function compatible with and .
Every participant samples random values , and uses these values as coefficients to define a degree polynomial .
Every computes a proof of knowledge to the corresponding secret by calculating , such that , , , , with being a context string to prevent replay attacks.
Every samples randomly and computes .
Every computes a proof of knowledge to the secret key by calculating , such that , , , , with being a context string to prevent replay attacks.
Every participant computes a public commitment , where , .
Every broadcasts , , , .
Upon receiving , , , from participant , , participant verifies by checking , where , and by checking , where . On failure, broadcasts and excludes from its list of participants.
If the number of remaining participants for is below a certain value decided by the subnet, the key generation is aborted. If not, the remaining participants advance to Round 2. For simplicity, we will still refer to the remaining participants as (), even though some may have been eliminated at the last step of Round 1.
Each does the following. For each , :
Compute a Diffie-Hellman key and a symmetric key .
Upon receiving from participant , , participant does the following:
Compute and .
Verify the share by checking . If the share is incorrect, initiate the procedure .
Participants resolve all complaints with the procedure . If the number of remaining participants is below a certain value decided by the subnet, the key generation is aborted. For simplicity, we will still refer to the remaining participant as ( in ), even though some may have been eliminated at the previous step.
Each calculates their long-lived private signing share by computing , stores securely, and deletes each .
Each calculates their public verification share , and the group’s public key . Any participant can compute the verification share of any other participant by calculating . Each then broadcast .
computes a proof that is well-formed, which is a proof of knowledge of such that is of the form , . To do so it proceeds as follow:
It computes , , where and .
It computes .
The proof is .
broadcasts the message
Verify the proof by checking , where and . If the proof is valid, go to step 2. Else, broadcast , exclude from the list of participants and terminate the procedure.
If there is an entry published by , go to step 3. Else, broadcast , exclude from the list of participants and terminate the procedure.
Compute . Verify the decrypted share by checking . If the share is correct, broadcast and exclude from the list of participants. Else, broadcast and exclude .
b.3 Signing phase
We assume that a key generation phase has been successfully completed. The remaining participants now each hold a secret share, and the group’s public key is . Let , be hash functions whose outputs are in .
The subnet selects randomly , , the index of signing participants. The signing participants are , .
Each , , samples single-use nonces .
Each broadcasts where and .
Each constructs , computes the binding values , , then derives the group commitment and the challenge .
Each computes their response using their long-lived secret share by computing using to determine the Lagrange coefficient .
Each deletes from their local storage, and then broadcasts .
Each does the following:
Upon receiving from participant , , , verify the validity of the response by checking . On failure, broadcast , exclude from the list of participants and go to step 5.
If all responses are correct, compute the group’s response .
Broadcast the signature along with and terminate the procedure.
If no signature has been generated and some participants have been excluded, go back to round 1 step 2 with the same minus the excluded participants. If the resulting set has less than members, abort the signature generation.
Appendix C WCPRB Proof of Correctness
Algorithm 4 solves Weak Causal Probabilistic Reliable Broadcast.
To prove that Algorithm 4 solves Weak Causal Probabilistic Reliable Broadcast we prove that all the properties are satisfied. For readability we recall for each property its definition.
No duplication: No correct process delivers more than one message. The proof follows from the No duplication property of the PRB and from the fact that when a message is .Delivered, it is removed from the set.
Integrity: If a correct process delivers a message , and the sender is correct, then was previously broadcast by . The proof follows from the No integrity property of the PRB. In fact, in Algorithm 4, to be delivered, a message was necessary .Delivered by PRB.
-Validity: If the sender is correct, and broadcasts a message , then eventually delivers with probability at least (1 - ). The proof follows from the -Validity property of the PRB and considering the following:
If broadcasts then at time and .Broadcast. Since , then by the -Validity of the PRB, .Delivers message , therefore, is placed in and being valid it will be removed from and .Delivered by .
-Consistency: Every correct process that delivers a message delivers the same message with probability at least . The proof follows from the -Consistency property of the PRB.
-Totality: If a correct process delivers a message, then every correct process eventually delivers a message with probability at least .
If delivers then .Delivers and . If .Delivers then by the -Totality of the PRB, every other correct process .Delivers and place it in with probability . We now have to prove that the message in will be eventually delivered, i.e., at . If at at that time, it means that has not yet delivered the messages delivered by with which . Thanks to the -Totality of PRB, will eventually .Deliver the same set of messages that .Delivered, at that time, , and then will .Deliver .
Weak causal order: If a correct process .Delivers a message then weakly casually precedes all the previously .Delivered messages. The proof simply follows from the definition of the Valid predicate and the check on before triggering .Deliver().