Thinkey: A Scalable Blockchain Architecture

04/09/2019 ∙ by Shan Chen, et al. ∙ Tencent QQ 0

This paper presents Thinkey, an efficient, secure, infinitely scalable and decentralized blockchain architecture. It ensures system correctness and liveness by a multi-layer structure. In particular, the system is based on a double-chain architecture and uses a multi-layer consensus protocol to guarantee consistency. Thinkey also uses a novel account model which is based on Actor Model to support the complex logic in the multi-chain structure. Experiment results show that the proposed Thinkey architecture can achieve higher throughput as the number of nodes increases.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

While the centralized Internet has promoted the rapid development of cyber systems and their communications, it also introduces many problems such as data ownership, information opaqueness, monopoly and lack of trust. Today, people are increasingly looking forward to trustworthy public computing infrastructures, in which an entity’s behavior is always carried out in an expected way. Such trustworthy infrastructures usually contain trustworthy hardware, systems, and software.

In such a context, a single entity can no longer meet people’s need for trust. People started to create trustworthy distributed computing platforms which involve participation from individual computing nodes (Bitcoin [19]) but are not controlled by any committee or individual. These computing nodes should be in the hands of diverse parties or individual clients running a program that is completely open-source with specifications open to everyone.

Blockchain is a trusted distributed computing platform that fulfills the above requirements. Recently, studies on blockchain related technologies are gaining popularity and attracting attention from researchers in different fields, such as cryptography, distributed system, network, economy, and social science. This field was originated from the work of Nakamoto [19], who proposed the first blockchain protocol as Bitcoin, and extended a lot of research directions, including consensus, privacy computing, mechanism design, network and data storage. It also found numerous applications in many areas, such as financial systems, supply chains, health cares.

As a distributed platform that is composed of thousands of individual computing nodes without any central controller, blockchain systems are facing many challenges. One of the most critical challenges is to achieve permissionless consensus among the distributed computing nodes in an efficient and scalable way. This is particularly true in the scenario where the system continuously grows with an increasing amount of clients and data, which requires (1) high efficiency in terms of handling multiple concurrent transactions in parallel, and (2) scalability in terms of maintaining high efficiency when the system scales up. However, the performance of current blockchain platforms is still far away from being satisfactory. For example, the average transaction rates in Bitcoin have been limited to below 10 transactions per second (Tx/s). More importantly, when the processing limit is reached, the blockchain platforms may have to continue by discarding some of their clients’ transactions, which will undermine the systems’ availability and significantly increase the usage costs.

To improve the system efficiency and scalability to support large-scale transactions (smart contract transactions), the sharding approach has been proposed as a feasible solution, which aims to divide transactions into multiple packages that can be handled in parallel by different committees of participating nodes [4]. The idea of network and transaction sharding was first proposed in Elastico [17] and promoted by Zilliqa [24], OmniLedger [15], Chainspace [1] and RapidChain [27]. However, current sharding solutions are still at their early stage and cannot well handle the following two main challenges: (1) how to split the state into multiple shards and support cross-shard transactions efficiently; and (2) how to select nodes to ensure the trustworthiness of each committee. Based on my understanding, I thought shards and tasks mean the same thing and revise this paragraph accordingly. If my understanding is wrong, please feel free to revise the draft accordingly. The major issue is to organize the order of the four challenges in a logical way.

1.1 Overview of Thinkey

In order to solve the challenges discussed above, we have proposed and implemented Thinkey, a novel blockchain infrastructure that is fully decentralized, trusted and scalable. In particular, our structure is based on a double-chain architecture consisting of a root chain and transaction chains. The workloads are partitioned and handled by parallel transaction chains. The root chain acts as the leader and coordinator of the entire system. In the transaction chain, we package the transactions into blocks and use the Byzantine consensus algorithm to reach consensus. The structure has great flexibility and scalability and can be dynamically adjusted so that each chain does not become a performance bottleneck of the entire system. In addition, as the number of chains increases, the throughput of the entire system increases linearly without generating too many redundant messages.

In addition, to efficiently handle cross-chain transactions, we propose a message-driven protocol based on Actor Model [10]

in this work. In particular, as the state of the entire network is partitioned by chains, and each chain is responsible for its own piece, a transaction involving multiple chains will appear with high probability as the number of chains increases. As a result, it requires efficient solutions to handle cross-chain transactions. In this work, we apply the Actor Model to our system, which allows implementing complex logic in an asynchronous and lock-free manner to keep chains concurrent and fully utilized. In this model, each account represents an actor. We decouple a transaction which involves a set of accounts into multiple steps by messages. Each message will be sent to a unique recipient and will be received and executed by the corresponding chain. Each chain is incentivized to handle messages to ensure the atomicity eventually.

Our Contribution. Particularly, this paper mainly makes the following contributions.

  1. A four-layer encapsulation system blockchain framework has been proposed and implemented. In particular, the proposed layered architecture enables the abstraction and separation of diverse system functionalities and allows each layer to focus on its own functionality. The design of the proposed blockchain framework strengthens the blockchain system’s compatibility and extensibility, providing a new way for blockchain framework’s development.

  2. A double-layer chain practical architecture that partitions transactions and states into parallel chains is proposed. The structure has great flexibility and scalability and can be dynamically adjusted, such that each chain does not become a performance bottleneck of the entire system.

  3. A new model of accounts in the multi-chain structure is proposed to process cross-chain transactions in an efficient and scalable way.

1.2 Related Work

The blockchain technology has been successfully proved by Bitcoin [19] and Ethereum [26] through several years of operations, and shows great potential to be applied in various applications. However, the performance and scalability issues remain as the main obstacles that hinder the rapid development of blockchain systems. Researchers from different domains are conducting various research projects to improve blockchain performances and scalability. For example, Bitcoin-NG [6] scales Bitcoin without changing the consensus protocol. It uses the same consensus protocol to elect a leader and commits ByzCoin [14] extends Bitcoin-NG to achieve high transaction throughput through a variant of PBFT [3]. Thunderella [21] has designed two chains to quickly confirm in optimistic situations. By replacing the proof-of-work with the proof-of-state, the committee can be chosen based on their stakes [7, 5]. Conflux [16] uses the structure of the DAG to generate blocks at the same time. However, these algorithms still require all nodes to validate and process all transactions.

Elastico [17] was the first candidate for a secure sharding protocol for permissionless blockchain, where nodes are assigned to committees through proof-of-work and each committee is responsible for managing a subset of transactions (shard) consistently through PBFT [3]. A final committee collates sets of transactions received from committees into a final block and then broadcasts it. In order to meet the performance, the size of its committee needs to be small, resulting in a high probability of failure. Zilliqa [24] has improved the election protocol and consensus protocol for improved security. OmniLedger [15] introduces a cross-shard commit protocol that atomically handles transactions affecting multiple shards. Chainspace [1] improved its solution so that there are no clients involvement and support for smart contracts. A recent study RapidChain [27] further optimizes these sharding protocols. It is resilient to Byzantine faults from up to a fraction.

2 Blockchain: A New Generation of Computing

Blockchain is a trusted distributed computing platform that can act as a shared computational resource. Clients can view this platform as a new generation of computing facilities, to which they can send any requests (transactions) to it. When the platform receives a request, it checks the legality of the request first, runs the request on some nodes, and returns the results to the client. Through this process, data and messages are transmitted on the platform to manage computing nodes and resources in a distributed way. In particular, a valid blockchain design should have the following properties. (a) Safety: the result is correct. (b) Liveness: each valid request is processed within some fixed (small) time duration. Here we assume that the platform has a unified trusted interface that allows clients to send requests and receive results. In addition, a consensus protocol is also required to ensure that contents running on different computing nodes are identical. 111We call the platform a trusted computing platform if it satisfies safety and liveness even if it is centralized. But currently, any single entity cannot satisfy these two properties, so we need a distributed system. The consensus protocol can ensure that the distributed systems can satisfy these two properties.

A close model is the classical state machine replication (SMR) [22, 3]. However, different from the SMR model, which is under the permissioned settings [20], the blockchain platform allows any node to join without getting permissions from an authority, which is called permissionless. In the permissionless context, the set of nodes is untrustworthy, which introduces a challenging issue as that an attacker can trivially mount a “Sybil attack”. In a Sybil attack, the attacker simply spawns lots of computing nodes and can thus easily ensure that it controls a majority of the computing nodes to achieve consensus. To prevent Sybil attacks, a common method is to use the proof-of-work (POW) [19] or the proof-of-stake (POS) [13].

2.1 System Model

In this section, we discuss the blockchain model proposed in this work in details. Formally, we specify the proposed blockchain model by a -tuple . Here, is the set of all nodes in the system, is the set of requests, is the consensus protocol and is the set of all possible behavior errors (or failures) of the system. This model proceeds for infinite rounds. In each round , the set of available nodes and the set of requests are given. We use and to denote the current sets of available nodes and requests, respectively. The goal is to generate a consistent log for each node in to process the set of requests .

For each round , we use to denote the duration of round and to denote the confirmation time of request . Then we have and average confirmation time . We define the throughput of round as the number of requests processed per second, which is equal to . Furthermore, to implement the consensus protocol in the proposed blockchain infrastructure, we introduce “log”, a combination of timestamps, to define a series of an ordered way (see Section 2.2 for more details). For any two given nodes, they will run the same request and reach the same state if and only if they have the same log.

2.2 System Design Goals

2.2.1 The Consensus

The consensus protocol

is the core of the distributed computing platform. A consensus protocol refers to an algorithm for a set of interactive Turing Machine (also called nodes) to interact with other nodes. We start with an interactive Turing Machine

and a set of requests . As discussed above, a log for is a combination of timestamps to define a series of ordered way. Usually, these requests are arranged in a linear, ordered way. It can also be arranged in parallel, even in the form of directed acyclic graphs (DAG). Since every directed acyclic graph has a topological ordering, we assume that the order of the log is linear. Formally, we have

where denotes requests of , and denotes times in . please double check the symbol. A timed execution of a machine is defined to be a sequence

where the represents states of the machine and is the initial state. We require that the sequence of the successive times in be nondecreasing. In the consensus protocol , each node maintains a local linear log that satisfies the following conditions.

  • Consistency: at any point in the execution, all honest nodes have consistent logs, either their logs are identical, or one node’s log is a prefix of the other’s.

  • Synchronism: whenever a node sees some requests in its log, the same request will appear in every other node’s log within some fixed (small) amount of time.

The goal of the system is to generate a consistent log of each node in to process the set of requests .

2.2.2 Reliability and Security

Failures and attacks can occur in many systems, especially in distributed systems consisting of heterogeneous individual computing nodes. To analyze the reliability and security of a system, it is crucial to correctly model failures and attacks faced by the system. We use to specify possible failures of nodes and the network, as well as the attack model.

The analysis of reliability is based on the assumption of the failure set . With a regular system setting, it should ensure that the system can achieve the desired performance with negligible probability of system failures.

When malicious attacks exist, system security may be under risk. We aim to design our system against typical attacks, such as denial-of-service attacks, Sybil attacks, and Byzantine behavior of nodes. (If we don’t want to claim that the system can handle all types of attacks, we can just specify what attacks we can handle. )

2.2.3 The Scalability

Throughput is an important system performance metric, which represents the system ability to handle requests. However, this indicator itself is not adequate. When the system reaches its processing limit, it has to discard excess requests requested by clients, making the system much less usable. Therefore, we need to have another performance metric to evaluate the system limit, which is scalability.

The term scalability is often found in blockchain literature to indicate that the system throughput can monotonically increase when the number of computing nodes increases in the system. This definition is not accurate and makes it difficult to evaluate and compare different blockchain systems. In addition, the scalability is often multifaceted, which can represent (1) load scalability, the ability for a system to accommodate heavier or lighter loads, (2) functional scalability , the ability to enhance the system’s capabilities by adding new functions, and (3) generation scalability, the ability to scale up by using new generations of components.

In this paper, we present a framework for analyzing the scalability of blockchain systems, which is somewhat similar to that of distributed systems [12]. Given a configuration for the system, we use to evaluate the performance and cost of the system by what does economy mean? the economic costs?

(1)

where and denote the throughput and the quality of the service (QoS) of the system respectively. denotes the system overall costs, including the node costs, network bandwidth costs and so on. For a blockchain, its QoS is mainly determined by the average confirmation time and the target confirmation time . In order to normalize the QoS to interval, we set . Then we have when and when .

Given an initial configuration , we can scale up the configuration of the system to by a scaling factor . A policy specifies how to scale up the configuration. For example, when the initial configuration has nodes, and the policy scales up the number of nodes by , the number of nodes in configuration is equal to . Then, we can quantitatively calculate the scalability as

(2)

If equals to or monotonically increases as increases, we say the system has perfect scalability under the policy .

Figure 1: Speedup with (a) , (b) , (c) and (d) .

2.2.4 The Speedup

Recall from Section 2.2.3 that the scalability of a system is based on scale-up policy . There are two ways to improve system performance. One is by increasing the performance of each node, and the other is by increasing the number of nodes. Similar to Amdahl’s Law [2, 11] which is a formula to compute the theoretical speedup for distributed systems, we define the speedup for blockchain systems. Suppose that we use times the budget to build the platform, part to improve the performance of each node, and part to increase the number of nodes. Here we assume that all the nodes are identical. We use times the budget to improve node performance, which can be increased by times. The function can be any arbitrary functions. Because of hardware limitations, usually we set or . Then the number of nodes can be expanded by times. For the request set , assume that the proportional part can be processed in parallel, while the remaining parts can only be serialized ( in Bitcoin). Then the speedup ratio of the entire system can be expressed as:

(3)

and the metric can be approximated as

I suggest to write scalability as another equation as it is an important performance metric. As shown in Figure 1, a larger value leads to a higher speedup, which agrees with our intuition. The results inspire us to view the entire system’s performance rather than focusing on node efficiencies. In particular, the design of the system should focus on the support of more parallelism through architectural and protocol design.

Besides, there are also some optimizations that can improve the performance and scalability of a blockchain system. For example, use DAG instead of the linear log and use POS protocol instead of the POW protocol.

3 The Architecture of Thinkey

In this section, we focus on the design of a blockchain infrastructure with high scalability without weakening reliability and security.

3.1 A Four-layer Encapsulation System Structure

Based on the above requirements, we introduce the high-level structure of Thinkey, which is a four-layer structure to support more parallel executions. As shown in Figure 2, the structure has four layers. The first layer is the task layer, which is mainly responsible for dividing requests and nodes and assigning different requests to specific committees for processing. The second layer is the process layer, which needs to process the assigned requests and produce logs. The third layer is the data layer, where the log and request data generated by each committee need to be aggregated according to a specific coding method to form a unified log. The fourth layer is the network layer, which is the basic layer to establish connections between nodes and provide communication. Next, we will discuss each layer in details.

Figure 2: The high-level structure of Thinkey overview. 1. Task layer: assign requests to specific processing committees. 2. Process layer: process requests and generates logs. 3. Data layer: synchronize the logs of committees to achieve consistency. 4. Network layer: provides communication between nodes.

Task layer. All requests are first sent to the task layer, where they will be divided and assigned to different committees for parallel processing. As not all requests can be processed in parallel, they need to be divided according to their types. In addition, all active nodes are registered at the task layer. These nodes are divided into committees and allocated to process different requests in a random way. We need to ensure that each committee is trusted, the proportion of malicious nodes inside each committee will not exceed a certain threshold set by the system.

Process layer. The process layer has a set of committees, where each committee contains a set of nodes. When a committee receives a given request, it needs to process the request, reach consensus, and generate a log. As each committee is guaranteed to be trustful at the task layer, this layer only needs to consider how to reach consensus in a committee as soon as possible.

Data layer. The goal of the system is that each node produces a consistent log. Therefore, an aggregation algorithm is required to integrate all the logs generated by the nodes in a committee to reach a unified log. An encoding method is also required to reduce the storage of each node. In addition, as nodes should join and leave different committees from time to time, it is essential to have the corresponding data from the data layer be synchronized.

Network layer. This layer provides the foundation of the entire system by enabling communications among different computing nodes. At the same time, within the network layer, we can build a multi-layer network and establish a layer of consensus network for each committee.

As a summary, in our architecture, a request generated by a client first passes through the task layer. After the classification at the task layer, the request is sent to a committee at the process layer. The committee processes the requests and sends the results and logs to the data layer. The data layer aggregates all the logs and returns the results back to the client.

3.2 A Double-layer Chain Architecture

From an implementation perspective, our chain structure is based on a double-layer chain architecture consisting of a root chain and transaction chains. Each (root or transaction) chain is an individual blockchain with its own state. As illustrated in Figure 3, transaction chains (the horizontal chains) are anchored to the root chain (the vertical chain).

The root chain acts as the leader and coordinator of the entire system. In particular, it serves as an entry point and source of trust to the transaction chains; records metadata and digests of confirmed blocks of each transaction chain; produces random seeds used in committee elections (of all chains); and records the election results.

The workload of the system is shared by the transaction chains. The cross-chain operations are handled using a message-driven protocol based on the actor model (see Section 5).

Figure 3: The basic double-layer architecture of Thinkey. The root chain records metadata and digests of comfirmed blocks of each transaction chain, produces random seeds and records the election results.

All nodes in the system maintain the state of the root chain. By updating and verifying blocks of the root chain, a node can verify (any part of) a block of a transaction chain that has been included in the root chain. This structure is advantageous in the following aspects.

  • To join the system, a node only need to obtain the current state of the root chain either from a trusted source or by reconstructing from the genesis block, which is a small workload compared to synchronizing all the data of the entire system.

  • The consensus of each (root or transaction) chain is carried out independently and in parallel, dramatically reducing the usage of network and processing power.

  • The root chain can serve as a coordinator of the system, which provides synchrony across transaction chains and allows system topology to be dynamic.

  • A node can verify a transaction initiated from another transaction chain, using the digest in the root chain and a Merkle proof. Therefore, a block producer of a transaction chain does not need any information from other transaction chains to process inter-chain transactions.

Furthermore, the root chain and transactions chains can perform deeper sharding (see Figure 4). Here, the root chain and transaction chains define rules and make a balanced distribution of transactions to its sub-chains. Then it only needs to process a small number of transactions and record the digest of each sub-chain.

Figure 4: Sharding for root chain and transaction chain in the double-chain architecture of Thinkey

To sum up, such a layered fragmentation structure has great flexibility and scalability and can be dynamically adjusted, so that each chain does not become a performance bottleneck of the entire network. In addition, as the number of chains increases, the throughput of the entire system increases linearly without generating too many redundant messages.

4 Our Protocol

In this section, we present our protocol in more details. In the basic double-layer architecture (see Figure 3), let denote the root chain and let denote the transaction chains.

4.1 Committee Selection

To resist the Sybil attack against the permissionless system, we use a proof-of-stake based election algorithm. The following security properties are desirable for the selection algorithm.

  • Assuming honest majority (of stake) among all participants, at least -proportion of the committee members elected in each election are honest except with negligible probability. Furthermore, the algorithm should be fair in the sense that the probability of each participant being chosen is (roughly) proportional to the amount of stake committed by the participant.

  • The committee members should be fluid and unpredictable so that the adversary cannot attack the system by corrupting committee members (assuming corruption takes longer than the life span of a committee).

In Thinkey, we achieve the above properties through the following processes. First, before the election, since only the root chain is listened to by all nodes and the chains are not synchronized, a transaction chain has to put a signal on the root chain when it needs to elect the next committee. The election will use the next seed after the signal shows up on the root chain.

Next, the election process is managed by the root chain. In particular, the randomness used in the election is provided by the random seed generated in the root chain . The committee of the root chain generates a random seed periodically, which is included in a block of . Our current implementation generates random seeds using DFINITY’s random beacon protocol [8], based on BLS threshold signature.

In addition, nodes willing to participate in the consensus need to register on the root chain by sending a special type of transaction. The transaction also specifies the amount of the stake, which will be transferred to a specific stake account and be frozen until the node quits and withdraws the stake. These nodes then monitor the election signals in the root chain. When a node sees an election signal and the next seed, with its secret key, it can compute whether it is elected to the next committee of the chain that sent the signal.

Once the committee selection is completed, nodes elected to a transaction chain will join the network of the chain and start receiving the blocks of the chain. They also start to synchronize the state of the transaction chain. The received blocks and the state can all be verified using the digests on the root chain. Each node sends an “elected” transaction containing its public key and a proof that it has been elected, so that the current committee and the other elected members are notified.

The elected nodes will set up a small consensus network that will be used for the communication within the committee. Having a dedicated network reduces the delay and bandwidth consumption of both point-to-point communications and broadcasts among committee members. On the other hand, if not set up properly, the network may be less stable and more vulnerable to attacks. It needs to be ensured that the network topology is robust and node information is exchanged securely using encryption.

By the time the next epoch begins, the new committee should have generated the keys for threshold signature (the public key should have been written to a block); have synchronized and updated the current state of the chain; and have established connections in the new consensus network.

4.2 Consensus in Committees

We assume a partially synchronous communication model within the committee, for which there exist efficient Byzantine-fault-tolerant algorithms. Here we present a variant of PBFT tailored for this setting.

Figure 5: Overview of committee consensus.

The execution of a node can be divided into rounds. Each round consists of three stages: proposal, preparation, and confirmation (see Figure 5). The state transitions are event-driven. In order to preserve system liveness in the presence of network failures or malicious attacks, a timeout may be triggered by the local clock.

  1. Proposal stage: The leader of the committee broadcasts the proposed block to other committee members.

  2. Preparation stage: Upon receiving the proposed block, each committee member broadcasts a message containing its signature of the block. If a timeout is triggered before the proposed block is received, the committee member signs and broadcasts a special message (which indicates the leader is faulty) to other committee members.

  3. Confirmation stage: At the end of the preparation stage, each committee member signs and broadcasts a package of the signatures received in the preparation stage. Signature aggregation can be used to significantly reduce the size of messages in the confirmation stage.

Based on the messages received in the confirmation stage, each committee member can decide whether agreement on the block has been reached, and broadcasts the agreed block or an empty block together with a proof of its decision.

Malicious nodes punishment. In the case when an explicitly misbehaving node is detected (a node that sends different messages to different nodes at the same stage), the round will abort by outputting an empty block. However, the misbehaving node will be economically punished by a large amount, making such attacks unsustainable.

Optimization. In the case when the number of signatures received at the preparation stage implies that most honest committee members have received the same proposed block, a committee member may reach an “early consensus”: the member can output the block before the confirmation stage with the signatures being the proof of agreement (of a different form compared to a regular agreement). Note that the node still needs to participate in the confirmation stage.

4.3 Cross-Chain Message and Verification

In a multi-chain system, a cross-chain operation is unavoidable. Each chain needs to process some messages which are generated by other chains. There are two kinds of cross chain messages in our system. The first one is the message from to which contains the digest of a block of . The message is for the final confirmation of the block of and there is only one such message for each block. The second one is the the outer relay message from to (see Section 5 for more details). Before is sent to , it is recorded on the block of .

Before the deal with these messages, we should verify them first. There are two approaches to verify a message generated by a chain: (1) verify the signature; (2) verify the message hash. Both approaches are useful, depending on the type of messages. For the message from to , it attaches a signature of the committee members of that can be used to verify authenticity. Since the committee member of is recorded in , each node in has the public keys of current committee members of and can verify the signature of .

For the message from to , it comes with proof that can be proved. Before is sent to , it is recorded on the block of . On the block of , there is a Merkel tree for all outer relay messages. The proof refers to the hash values of all sibling nodes on the path from the Merkle tree root to its entry. The root hash of is included in the digest of . Since the digest of the block of is recorded in the root chain , and each node in is also a node in , each node in can obtain the root hash of and verifies by the proof .

The reason for not using the signature to verify is to prevent the situation where the committee members of are not completely reliable. The message generated by this block will only take effect if its block is finally confirmed on . However, for a cross-shard message of the same transaction chain (see Figure 4), its verification can be faster, just wait until the digest is recorded on its transaction chain , not the root chain .

4.4 Network Layer

The decentralized nature of P2P networks makes them an ideal solution for the underlying communications of blockchain systems. As the network part often becomes the bottleneck of high throughput blockchains, P2P technology has attracted more and more attention as a potential technology for blockchain networks. In blockchain, as a large number of data broadcasting operations are required, efficiency and redundancy are the key issues that all blockchain P2P network design must pay attention to.

Conventional blockchain P2P networks (such as Bitcoin and Ethereum) are often based on unstructured design ideas. When broadcasting, a partial-broadcasting mechanism is adapted to alleviate the message redundancy problem. For example, when broadcasting a large message , node randomly selects some nodes in its connection list for full transmission. For the remaining nodes (), only the profile of (the hash of ) is transmitted. Only when needs the message and cannot find it in its own database, it will pull the message from . In the experiment part, we have proved through comparison of several sets of parameters that the design can significantly reduce the redundancy of message transmissions.

Figure 6: Data redundancy.

However, this solution still has its limitations. For example, the requirements for higher throughput and point-to-point transmission will become inadequate under a structured P2P network. In such cases, the structured network (such as DHT) is a solution that can be used for further optimization. In a structured P2P network, nodes follow some rules during the networking phase (Chord [23], Kademlia [18]). This provides directional guidance for data transfer and can greatly reduce the number of useless transfers. However, in blockchain, especially in public chain projects, there are assumptions that nodes can join and exit arbitrarily. Networks with frequent changes will incur huge costs for structure maintenance. Therefore, one of the challenges in adopting structured P2P networks is to deal with the instability and uncertainty of complex networks. In Thinkey, we use a combination of structured + unstructured methods. Some scenarios (such as point-to-point transmission) use a structured P2P network approach to reduce redundancy, and to improve efficiency, while having an unstructured transmission as a guarantee of stability.

Figure 6 shows the impact of the partial-broadcasting mechanism in reducing message redundancy. We assume that each node is randomly connected to 10 nodes. The horizontal axis shows the number of complete information received by each node when the node sends complete information to all, , , , and neighbors. For each case, we count the redundancy of the network when it expands from 100 nodes to 1000 nodes. It can be seen that (1) the system redundancy can be significantly reduced when the number of full message broadcast decreases. When a node broadcasts a complete message to only one neighbor, each node will only receive the same complete information less than twice. (2) When the number of neighbors connected to each node is constant, the amount of redundancy has nothing to do with the size of the network.

5 A New Model of Account in the Multi-Chain Structure

For the multi-chain system, the current account models (UTXO or Ethereum account) for the single-chain system have not adapted to meet the new realities, especially when dealing with a large number of cross-chain operations. Based on the theoretical of Actor Model [10], we propose a new account model, which allows us to implement complex logic on the multi-chain system in an asynchronous and lock-free manner. In this model, we decouple a transaction which involves a set of accounts into multiple steps in the form of messages. Each message is received by a unique recipient and will be executed by the corresponding chain. All messages will be executed and will eventually achieve the transaction correctly.

5.1 Actor Model

Actor Model was introduceted by Hewitt [10] in 1973, which is a conceptual model to deal with concurrent computation. The model has been used both as a framework to understand concurrency and as the theoretical basis for several practical implementations of concurrent systems. For example, the electronic mail system, web services and objects with locks in Java. See [9] for more details.

Figure 7: Actor Model

An actor is the universal primitives of concurrent digital computation. When receiving a message, the actor can concurrently:

  1. send a finite number of messages to other actors;

  2. create a finite number of new actors;

  3. designate the behavior to be used when the next message is received.

There is no assumed sequence to the above actions and they could be carried out in parallel. Each actor has a mailbox to receive messages from other actors (see Figure 7). Messages are sent asynchronously to an actor and each actor processes messages sequentially with no restriction on message arrival order. Multiple actors can run at the same time. It is worth noting that actors are completely isolated from each other and they will never share memory. Each actor maintains a private state that can never be changed directly by another actor.

5.2 Thinkey Account

We adopt Actor Model as the basic framework for our account model. In Thinkey, the account structure mainly contains the following field pieces of information:

Address: The unique identifier of the blockchain account.

Balance: The account’s current balance.

Nonce: A scalar value equal to the number of externally messages sent from this address.

Code: The programmatic logic for processing messages.

Storage: The internal state of the account which can be empty.

Each account is controlled by a private key. In the code, the account defines its own methods for the messages it received. It is allowed to send a message to other accounts, to create a new account and to modify the internal state. For some specfic messages, there are general processing methods which are identical for each account (“” and “” in Section 5.4). Each account also can customize the methods for other messages.

There are two types of message: external message and relay message. The external message is created by an account who signs it with its private key. The relay message is produced by an account who executes the send command in the process of execution, which is somewhat similar to the message in Ethereum. The biggest difference is that the relay message execution is asynchronous in our model and synchronous in Ethereum. Thus these messages in our model support cross-chain propagation. A message in Thinkey mainly contains the following field pieces of information:

From: The address of the messages’ sender.

To: The address of the recipient of the message.

Nonce: A scalar value equal to the number of external messages sent by the sender or, for relay messages, nil.

Input: A tuple specifying the input data of the message call.

Verification data: A signature identifying the sender for externally message, or a proof for the relay message.

For an external message, it can be verified by the signature and nonce. For a relay message, it can be verified by the proof (see Section ).

5.3 Thinkey Block

For each block on a chain , there is three status of a message on the block.

  1. Input messages. These messages are currently not confirmed and the recipients of them are in . They can be external messages or relay messages produced by other chains.

  2. Inter relay messages. These are the messages generated during the execution of the entire block, and their receivers are on the same chain . Thus they are confirmed on the block.

  3. Outer relay messages. These are the messages generated during the execution of the entire block, and their receivers are on other chains. These messages will be confirmed by other chains.

We describe an example shown in Figure 8. There are three accounts on the chain and three accounts on other chains. In particular, there are three input messages (), received by the accounts and respectively.

Figure 8: An example of processing for the accounts in our model

For each account , we use to denote the order in which the messages are processed and the new messages are generated. Then, we have

Thus indicates that account processes message , and in order. An inter relay message is generated by excuting and an outer relay message is generated by excuting . Then we have inter relay messages set and outer relay messages set .

On receiving a block proposed by a leader, a committee member needs to verify the block to defend against a malicious leader. In our system, a node verifies three parts:

  1. Validity of each input message, .

  2. Validity of the proceesing for each account, . They can be verified independently and in parallel.

  3. Validity the processing message order, the order .

To verify the processing message order , we build a direct graph below. For two events and , the expression of indicates that the time of precedes the time of in the frame of reference for every relativistic observer. Let denote the event that an account receives a message , and denote the event that an account sends a message . Then we have . For each , for example , we have . Based on these relationships from , we can build a directed graph (see Figure 9).

Figure 9: The direct graph of the processing message order

The processing message order is valid if and only if the dircet graph does not have a circle.

5.4 A Payment Application

In this section, we present a payment process, such as Bitcoin, as an example of the Thinkey’s potential applications. We define an account with a local state as (see below). There are two kinds of messages: “” and “”, which represent withdraw and deposit respectively. A simplified sample code written in the Elixir language [25] that describe the payment process is shown in Algorithm 1.

1:  balance
2:  while receive message  do
3:     if  then
4:        if balance bill then
5:           balance balance bill
6:           send
7:        end if
8:     else if  then
9:        balance balance bill
10:     end if
11:  end while
Algorithm 1 The account module for payment application

As shown in Figure 10, we use an example to demonstrate the process. Suppose there are three accounts Alice, Bob, and Charlie with balance respectively. There are two messages and , which are signed by Alice and Bob respectively. Message is a transfer with from Alice to Bob and message is a transfer with from Bob to Charlie. Then as we set on the code, message is processed on Alice’s chain as follows.

  1. An unconfirmed message is picked up by the committee members in Alice’s chain if the signature is validated when the members construct a new block.

  2. When the balance of Alice is not less than the transfer amount (satisfies Line in Algorithm 1), and Alice and Bob belong to the same chain, the committee members records the message processing procedure on the block. It means that processing produces (Line in Algorithm 1). Meanwhile, the message is marked as an inter-relay message and executed on the block. In this case, the balance of Alice is deducted by (Line in Algorithm 1) and the balance of Bob is added by (Line in Algorithm 1).

  3. When Line in Algorithm is satisfied, and Alice and Bob are from different chains, is also recorded on the block. Meanwhile, the committee members mark the as outer relay message on the block and send to the destination chain, Bob’s chain. In this case, the withdraw operation for Alice is executed (Line in Algorithm 1). The outer relay message will be excuted in the same way at Bob’s chain.

  4. When the balance of Alice is less than the transfer amount (Line in Algorithm 1 is not satisfied), is recorded on the block. In this case, the balances of Alice and Bob do not change.

Figure 10: A transfer example

For each account , we use to denote the order in which the messages are processed and the new messages are generated. Thus, when these three accounts are on a same chain, we have as shown in Figure 10, and record the message processing procedures in the block. Then nodes can get the same state through . The only thing to do is to reach a consensus on (block) because the reception order of messages can affect behaviors. For example, the order in which the processing messages and for Bob can be reversed to get another valid message processing procedures . When there three accounts are on different chains, the message processing procedures , and are recorded on corresponding chains respectively, which complete these two transfers.

We remark that the transfer of balance is confirmed at the first stage. As long as the message is included in the block during the first phase, Bob can be certain that the transfer has been confirmed. Bob can just wait for the settlement. And the value being transferred can be used. Through this process, cross-chain transfers can achieve the same speed of confirmation as intra-chain transfers.

5.5 CryptoKitties Application

In the previous section, we present a payment application. Moreover, our model can support any complex logic. We take CryptoKitties as another example, which is a popular application in Ethereum and attracts a lot of people. CryptoKitties is an electronic pet game based on the Ethereum smart contract. In the game, players can buy and sell. However, the kitty auctions can block the Ethereum network, which undermines the system’s availability and significantly increases the usage costs. In our account model, we propose a multi-chain solution to solve network congestions.

Figure 11: A multi-chain solution for CryptoKitties

In particular, our program involves three types of accounts: a Collector, multiple Auctioneers, and a Recorder as shown in Figure 11. The Collector is responsible for collecting all auction requests from a seller. The Auctioneers will auction the kitties. And the Recorder is responsible for counting auction results.

Processing is kicked off by an Auctioneer requesting a kitty from the Collector. When the Auctioneer receives the kitty, it can auction this kitty and sends the auction result to the Recorder. Finally, the Recorder lets the Collector know whether the kitty has been auctioned. The auctioneer can be placed on different chains. The workloads of the auctions can be partitioned in multiple independent and parallel chains.

It is possible that a single kitty attracts a committee of people to bid, which is called “single kitty hotspot” issue. In this case, the auction of the hot kitty can be held by different auctioneers. The bidder can choose any auctioneer to bid. Finally, all the auction results will be summarized and sent to the Recorder. According to the rules, the Recorder selects a bid winner for the hot kitty.

5.6 Optimization and Discussions

We use optimization to reduce the cost of communication and the storage of accounts. The first optimization aims to avoid duplicated storage. For some specific messages, there are general processing methods that are identical for each account. These public methods are designed by the system and are not required to be designed by each account. The second optimization reduces the cost of communication by merging a bunch of messages of the same type. For example, if there are 10 “” messages, which are sent to the same account, they can be combined into one “” message.

For the “single account hotspot” issue, where a single account is involved in a great number of messages, it can be easily resolved by the co-design of applications at the upper layer, which is similar to the “single kitty hotspot” issue as discussed in Section 5.5. The popular accounts can be placed on multiple chains.

6 System Analysis and Experiment

In this section, we will analyze the security, the performance, and complexity of Thinkey.

6.1 Security

Let be the number of nodes, be the expected number of nodes in a committee, be the number of committees. The number of malicious nodes is . We say a committee is failed when more than of the nodes in the committee are malicious nodes. Without loss of generality, we set . Suppose there is a completely random oracle . Fix a committee , let be the event that the fraction of malicious nodes in the committee . Then for each , we have

(4)

By union bound, we have

(5)

We can ensure that the probability of the event occurring is negligible for an adequate parameter setting.

6.2 Experiment

In the experiment part, we simulate a blockchain system including 400 nodes to analyze the performance of our protocol. Each node works on a single core CPU with 2G RAM. Our system is implemented using Go.

Figure 12: The relationship between TPS and the number of shards.

6.2.1 Scalability

In the first experiment, we analyze the relationship between the throughput per second (TPS) and the number of shards in this system. Here, all the accounts involved in the transaction are randomly generated. As shown in Figure 12, the linear increasing curve shows that Thinkey achieves our design goal on the scalability through the proposed sharding scheme. That is, when the number of chains (shards) increases, the TPS can also increase linearly, instead of stagnation of single-chain system (Bitcoin).

Figure 13: The relationship between the block time and the block size.

6.2.2 Confirmation Latency

The confirmation latency is completely related to the block time (the time interval between two adjacent blocks). In the second experiment, we analyze the relationship between the block time and block size (the number of messages in the block). As shown in Figure 13

, when the block size is fixed, the distribution of the block time roughly follows Gaussian distribution with small fluctuations. In addition, when the block size increases, the block time also increases monotonically. This result indicates that the TPS cannot be increased directly by increasing the block size. Comparing the result in the previous section, the best way to scale out the TPS is to use the sharding scheme.

Figure 14: The relationship between the epoch time, the number of shards and the ratio of cross-chain transactions.

6.2.3 Cross-chain Overhead

In a multi-chain system, a cross-chain operation is unavoidable. Especially when the number of chains is large, the ratio of cross-chain transactions (percentage of cross-chain transactions among all transactions) will be high. In the third experiment, we analyze the relationship between the epoch time and the ratio of cross-chain transactions. We generate different numbers of shards, setting it to and . As shown in Figure 14, the epoch time increases when either the number of shards or the ratio of cross-chain transactions increases. However, when the number of shards is fixed, the smooth curve shows that the overhead of the cross-chain transaction is small, which achievers our design goal on handling cross-chain transactions through a new model of accounts. That is, processing cross-chain transactions in an efficient and scalable way.

7 Conclusion

We proposed and implemented Thinkey, a distributed blockchain architecture that enables efficient, secure and highly scalable transaction handling. We proposed a four-layer encapsulation system structure and a double-layer chain architecture. In particular, the four-layer structure enables the abstraction and separation of diverse system functionalities and allows each layer to focus on its own functionality. The double-layer chain architecture can make the system scale out linearly by partitioning the workload of the system. Additionally, committee selection and cross-chain verification are key protocols proposed by our system, ensuring the efficiency and security of systems. We also proposed a novel account model to support any complex logic in a multi-chain structure. In our experiment, we demonstrate that our system delivers throughput.

References

  • [1] Al-Bassam, M., Sonnino, A., Bano, S., Hrycyszyn, D., and Danezis, G. Chainspace: A sharded smart contracts platform. arXiv preprint arXiv:1708.03778 (2017).
  • [2] Amdahl, G. M. Validity of the single processor approach to achieving large scale computing capabilities. In Proceedings of the April 18-20, 1967, spring joint computer conference (1967), ACM, pp. 483–485.
  • [3] Castro, M., Liskov, B., et al. Practical byzantine fault tolerance. In OSDI (1999), vol. 99, pp. 173–186.
  • [4] Corbett, J. C., Dean, J., Epstein, M., Fikes, A., Frost, C., Furman, J. J., Ghemawat, S., Gubarev, A., Heiser, C., Hochschild, P., et al. Spanner: Google’s globally distributed database. ACM Transactions on Computer Systems (TOCS) 31, 3 (2013), 8.
  • [5] Daian, P., Pass, R., and Shi, E. Snow white: Robustly reconfigurable consensus and applications to provably secure proofs of stake. Tech. rep., Technical Report. Cryptology ePrint Archive, Report 2016/919, 2017.
  • [6] Eyal, I., Gencer, A. E., Sirer, E. G., and Van Renesse, R. Bitcoin-ng: A scalable blockchain protocol. In NSDI (2016), pp. 45–59.
  • [7] Gilad, Y., Hemo, R., Micali, S., Vlachos, G., and Zeldovich, N. Algorand: Scaling byzantine agreements for cryptocurrencies. In Proceedings of the 26th Symposium on Operating Systems Principles (2017), ACM, pp. 51–68.
  • [8] Hanke, T., Movahedi, M., and Williams, D. Dfinity technology overview series, consensus system. arXiv preprint arXiv:1805.04548 (2018).
  • [9] Hewitt, C. Actor model of computation: scalable robust information systems. arXiv preprint arXiv:1008.1459 (2010).
  • [10] Hewitt, C., Bishop, P., and Steiger, R.

    A universal modular actor formalism for artificial intelligence.

    In Proceedings of the 3rd international joint conference on Artificial intelligence (1973), Morgan Kaufmann Publishers Inc., pp. 235–245.
  • [11] Hill, M. D., and Marty, M. R. Amdahl’s law in the multicore era. Computer 41, 7 (2008).
  • [12] Jogalekar, P., and Woodside, M. Evaluating the scalability of distributed systems. IEEE Transactions on parallel and distributed systems 11, 6 (2000), 589–603.
  • [13] Kiayias, A., Russell, A., David, B., and Oliynykov, R. Ouroboros: A provably secure proof-of-stake blockchain protocol. In Annual International Cryptology Conference (2017), Springer, pp. 357–388.
  • [14] Kogias, E. K., Jovanovic, P., Gailly, N., Khoffi, I., Gasser, L., and Ford, B. Enhancing bitcoin security and performance with strong consistency via collective signing. In 25th USENIX Security Symposium (USENIX Security 16) (2016), pp. 279–296.
  • [15] Kokoris-Kogias, E., Jovanovic, P., Gasser, L., Gailly, N., Syta, E., and Ford, B. Omniledger: A secure, scale-out, decentralized ledger via sharding. In 2018 IEEE Symposium on Security and Privacy (SP) (2018), IEEE, pp. 583–598.
  • [16] Li, C., Li, P., Xu, W., Long, F., and Yao, A. C.-c. Scaling nakamoto consensus to thousands of transactions per second. arXiv preprint arXiv:1805.03870 (2018).
  • [17] Luu, L., Narayanan, V., Zheng, C., Baweja, K., Gilbert, S., and Saxena, P. A secure sharding protocol for open blockchains. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (2016), ACM, pp. 17–30.
  • [18] Maymounkov, P., and Mazieres, D. Kademlia: A peer-to-peer information system based on the xor metric. In International Workshop on Peer-to-Peer Systems (2002), Springer, pp. 53–65.
  • [19] Nakamoto, S. Bitcoin: A peer-to-peer electronic cash system.
  • [20] Pass, R., and Shi, E. Rethinking large-scale consensus. In Computer Security Foundations Symposium (CSF), 2017 IEEE 30th (2017), IEEE, pp. 115–129.
  • [21] Pass, R., and Shi, E. Thunderella: Blockchains with optimistic instant confirmation. In Annual International Conference on the Theory and Applications of Cryptographic Techniques (2018), Springer, pp. 3–33.
  • [22] Schneider, F. B. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Computing Surveys (CSUR) 22, 4 (1990), 299–319.
  • [23] Stoica, I., Morris, R., Karger, D., Kaashoek, M. F., and Balakrishnan, H. Chord: A scalable peer-to-peer lookup service for internet applications. ACM SIGCOMM Computer Communication Review 31, 4 (2001), 149–160.
  • [24] Team, T. Z. The zilliqa technical whitepaper.
  • [25] Thomas, D. Programming Elixir≥ 1.6: Functional|> Concurrent|> Pragmatic|> Fun. Pragmatic Bookshelf, 2018.
  • [26] Wood, G. Ethereum: A secure decentralised generalised transaction ledger. Ethereum project yellow paper 151 (2014), 1–32.
  • [27] Zamani, M., Movahedi, M., and Raykova, M. Rapidchain: scaling blockchain via full sharding. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security (2018), ACM, pp. 931–948.