Coded State Machine -- Scaling State Machine Execution under Byzantine Faults

06/26/2019 ∙ by Songze Li, et al. ∙ University of Southern California University of Illinois at Urbana-Champaign University of Washington 0

We introduce an information-theoretic framework, named Coded State Machine (CSM), to securely and efficiently execute multiple state machines on untrusted network nodes, some of which are Byzantine. The standard method of solving this problem is using State Machine Replication, which achieves high security at the cost of low efficiency. We propose CSM, which achieves the optimal linear scaling in storage efficiency, throughput, and security simultaneously with the size of the network. The storage efficiency is scaled via the design of Lagrange coded states and coded input commands that require the same storage size as their origins. The computational efficiency is scaled using a novel delegation algorithm, called INTERMIX, which is an information-theoretically verifiable matrix-vector multiplication algorithm of independent interest. Using INTERMIX, the network nodes securely delegate their coding operations to a single worker node, and a small group of randomly selected auditor nodes verify its correctness, so that computational efficiency can scale almost linearly with the network size, without compromising on security.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

A state machine is a program that executes a state transition function . In each round , given an input command and the current state , the machine uses to compute an output command , and transitions its state to . In this paper, we consider the problem of securely implementing independent and identical state machines over a network of untrusted compute nodes, some of which are subject to Byzantine faults. Such model has a wide range of applications. For instance, multiple financial institutes manage their users’ accounts over a data center comprised of commodity hardware, or a blockchain system maintains multiple ledgers (shards) of users’ transactions over a peer-to-peer network.

For such a system, there are three critical performance metrics: 1) security , defined as the maximum number of malicious nodes that can be tolerated; 2) storage efficiency , defined as the total number of state machines that can be supported given that each node has a storage size of a single state; and 3) throughput , defined as the total number of commands processed per unit time. Information theoretically, all three metrics scale at most linearly with the network size .

However, simultaneous linear scaling of all three metrics has never been achieved in the literature. The canonical approach to deal with faulty nodes/processors is state machine replication (SMR) (see, e.g., [2, 26, 43, 27, 11]). The most classic one is full replication, where all the state machines are replicated at all the nodes. By applying the majority rule, full replication achieves a linearly-scaling security of , but a storage efficiency of only , and a throughout of only , where is the computational complexity of the state transition function (details in Section 3.1). An alternative approach is partial replication, where different state machines are replicated at a different disjoint subset of nodes. Partial replication improves the storage efficiency and throughout to and , respectively, which scale linearly with by letting scale linearly with . However, since each state machine is only handled by a set of nodes at any time regardless of the allocation strategy [25, 36], the security drops by times to once the adversary identifies this set and then corrupts it. A key theoretical question in this context is whether this tradeoff between efficiency and security is fundamental or can be circumvented by careful algorithm design.

Security Storage efficiency Throughput
Full Replication
Partial Replication
Information-Theoretic Limit
Coded State Machine (CSM)
Table 1: Performance comparison of proposed CSM with state machine replication and information-theoretic limits in synchronous networks. Here is some constant ( is a concrete example). The state transition function is an arbitrary multivariate polynomial with constant degree . and are the computational complexities of and the coding operations per node in CSM respectively.

The main contribution of this paper is to demonstrate that there is no fundamental tradeoff between (storage and throughout) efficiency scaling and security scaling in state machine operation. In particular, we propose “Coded State Machine” (CSM), which simultaneously achieves linear scaling for both storage efficiency and security, for both synchronous and partially synchronous networks (see Table 1). CSM also achieves an almost linear scaling for throughput under synchronous networks through the development of an interactive matrix-vector multiplication verification protocol named INTERMIX (specifically, INTERMIX helps to reduce the coding complexity per node in Table 1 to ).

Figure 1: A block diagram illustration of main operations in Coded State Machine.

CSM works on a general class of state transition functions that are multivariate polynomials. The key idea behind CSM is to have each node execute the state transaction function on a coded state and a coded command, so that the coded outputs can be used to decode the original outputs. More specifically, CSM generates each coded state/command by evaluating a Lagrange polynomial at some points, such that the output of

can be viewed as evaluating another polynomial of higher degree at the same point. Given enough evaluations (some of them could be erroneous) of this new polynomial, we employ efficient noisy polynomial interpolation algorithms (e.g., Reed-Solomon decoding) to decode the

original outputs, providing the security guarantees of CSM. Moreover, since a coded state has the same size as an original state, CSM retains the optimal storage efficiency. Finally, to reduce the encoding/decoding overhead for throughput scaling, we utilize INTERMIX to delegate the coding operations of all nodes to a single worker node, where fast polynomial interpolation/evaluation algorithms can be applied to minimize computational complexity. We illustrate the main operation flow of CSM in Figure 1.

Related Works. Conventional SMR research focuses on designing optimal protocols for the consensus phase (where the nodes aim to agree on which input command submitted by the clients should be used) that tolerate the maximum number of faulty processors  [28, 27, 37, 17, 11, 10]. CSM uses the same consensus protocols to decide on the input commands.

Concepts from coding theory have been utilized earlier to provide fault tolerance to state machine execution. Building upon prior work [20, 3], fused state machine is proposed in [4]. There, given primary finite state machines, fusion replica machines are constructed to correct up to Byzantine faults. The fusion replicas are components of the closed partition lattice [22, 29] of the reachable product state of the primary machines, and are selected such that the minimum distance between any two product states is at least . Compared with the proposed CSM, while the fused state machine also achieves security scaling, the storage size at each replica increases with the number of primary machines , as opposed to a constant storage size of a single state in CSM. When the state machine operations are write and read, i.e., the state machine system emulates a fault-tolerant shared memory, erasure codes have been exploited to minimize the storage costs given the coexistence of multiple versions of the data object, while ensuring that a reader can decode an atomically consistent version [23, 1, 8, 13, 9, 47].

The coding design of CSM is inspired by recent developments in coded computing [32, 31, 30, 16, 49, 46, 48], which leverage tools from coding theory to inject computation redundancy in order to improve latency, security, and privacy of distributed computing applications. In particular, as similarly done in Lagrange Coded Computing [48] to create coded data batches, CSM creates and processes coded states/commands using the Lagrange polynomial. However, in contrast to one-shot computation on static data in [48], the requirement of dynamically updating the local coded state that is compatible with the upcoming state transition poses new challenges in the design of CSM.

Compared to the existing verifiable matrix-vector multiplication schemes [18, 51, 19, 50, 42], INTERMIX has two advantages. First, it is information-theoretically secure, i.e., it is secure even if the nodes have unlimited computation power. Second, INTERMIX enables almost every node in the network (with the exception of a constant number of auditors) to verify the correctness of the results in constant time. By contrast, in the existing schemes the verification complexity is at least linear in the size of the output. INTERMIX has the shortcoming that it is an interactive algorithm. However, the small number of required interactions (logarithmic in the number of nodes) allows us to trade a low communication overhead in exchange for the aforementioned desirable properties.

2 System Description and Problem Formulation

We define a deterministic state machine (SM) as a collection of an input alphabet , an output alphabet , a state space , and a deterministic state transition function , for some vector spaces , , and over some field . The state of the state machine evolves in discrete rounds. At round , given the current state and the input command , the state machine computes a state transition function to generate an output response , and transitions into the next state .

We consider operating independent state machines with the same state transition function on unreliable compute nodes (e.g., hosting database services on commodity machines in a datacenter). There are a total of clients, who submit their queries/commands to the state machines for processing. In particular, at time , each SM has state , and selects an input command from the pool of commands submitted to SM . Then, SM executes by computing , returns the output result to the client who submits , and transitions into next state .

Figure 2: Illustration of operating state machines over nodes, one of which (node 2) is malicious. Malicious nodes can compromise the system security in the consensus phase, in the execution phase when honest nodes try to decode the computation results, and when delivering the decoded outputs to the clients.

2.1 Network and Failure Models

We consider a fully connected network between the clients and the compute nodes; Figure 2(a) shows a subset of clients continuously and concurrently submit their commands to be processed at SM  by broadcasting them to all compute nodes. For the communication between the nodes, we study two settings:

Synchronous network, with a fixed and known upper bound on the communication latency between any pair of nodes.

Partially synchronous network, with unbounded communication delay until an unknown global stabilization time (GST), after which the network becomes synchronous. Here a node expecting a message cannot distinguish between a failed message sender and a slow network since it does not know if GST has been reached.

To implement the state machine operations over the nodes, at round , each node  locally stores some (possibly coded) data , for some vector space over , which is generated by some function over the states . With these locally stored data, the implementation proceeds in two phases: the consensus phase, and the execution phase.

Consensus Phase. For each round index , all network nodes run some consensus protocol, via exchanging messages (possibly over many iterations), to reach an agreement on a set of input commands to process (the consensus setting is standard in the literature, e.g., [15]). For each , we label the index of the client who submits as .

Execution Phase. As shown in Figure 2(b), each node computes locally some intermediate result , as some function of its stored data , and the commands agreed in the consensus phase, and multicasts to a subset of other nodes. Using the local computation result, and the results received from other nodes, each node recovers the state transition functions for a subset of state machines via some function , obtaining estimates . Then, each node returns each of its computed output to the intended client , and updates its local storage to by computing some function on the current storage , and the updated states . That is,

(1)

Finally, for each , after receiving the computed outputs from different nodes, client decides on the output .

We consider an untrusted network where a subset of the nodes are subject to authenticated Byzantine faults. That is, a faulty node can exhibit arbitrary behaviors that deviate from the above described protocol, but all messages between nodes are cryptographically signed, and hence impersonating others’ messages is easily detectable.

We say that a computation scheme , specified by the nodes’ storage design, and the node operations in the consensus and execution phases, is -secure if for any subset with such that the nodes in are honest, and the nodes in are subject to authenticated Byzantine faults, the scheme achieves:

Validity: the command selected in the consensus phase is indeed submitted by some client to SM before the start of round , for all .

Consistency: for each round index , no two honest nodes in decide on different values for .

Correctness: for each , , for all ; and the client output , for all .

Liveness: all clients’ commands are executed.

2.2 Performance Metrics

A computation scheme is characterized by its security and efficiency.

Security () is the maximum value of such that the computation scheme is -secure.

Storage efficiency () is the ratio between the required memory size to store all states and the size of the data stored at each node, i.e., . Given a fixed storage size at each node, indicates the maximum number of state machines the scheme can securely support.

Throughput () is the average number of input commands that can be securely processed per unit operation at each node. That is, , where denotes the computational complexity of the function measured in number of additions and multiplications in . Here we focus on the regime where the computation latency is dominated by operations in the execution phase. Since the consensus phase of later rounds can be performed in parallel with the execution phase of current round, we do not consider the computational complexity of the consensus phase in the throughput definition. Also, we consider the case where all operations are carried out in memory, and no disk I/O is needed.

3 State Machine Replication and Information-Theoretic Limits

In this section, we analyze state machine replication (SMR) and the information-theoretic limits.

Full vs. Partial Replication. A classic SMR is full replication, where the state of every state machine is replicated across all nodes. For each round , the nodes altogether run some consensus algorithm to reach an agreement on the value of a vector of inputs . It is clear that the validity requirement is satisfied since each node knows all the commands submitted to all state machines.

Synchronous network. We use the Byzantine generals protocol [28] in the consensus phase, where a unique set of commands are proposed by a leader node and disseminated across the network. With the protection of digital signatures, the consistency requirement can be satisfied for an arbitrary number of malicious nodes. In the execution phase, each honest node executes the agreed commands, and sends the outputs to intended clients. For a state transition function with constant complexity, the full replication scheme achieves a constant throughput of . Each client waits for matching responses from the nodes before it accepts the output result. Hence, the total number of nodes needs to be at least . This promises a security of .

Partially synchronous network. We employ the PBFT protocol [11, 10] in the consensus phase, which requires as least nodes. The execution phase is the same as in the synchronous setting. In partially synchronous networks, the security drops to .

Alternatively, we can partially replicate each state machine in a disjoint subset of nodes. This yields a storage efficiency of , and a throughput of . Partial replication has the same security as full replication on nodes. That is, in a synchronous network, and in a partially synchronous network.

Information-Theoretic Upper Bounds. Since the aggregated storage size of the entire network has to be at least the size of all state machines, the storage efficiency . To process input commands, the state transition function has to be executed at least times across the network, and thus the throughput . Finally, the maximum number of malicious nodes any computation scheme can tolerate cannot exceed half of the network size . Thus, the security . By letting scale linearly with , the upper bounds on all the three metrics scale linearly with . However, both full replication scheme and partial replication scheme make tradeoffs between their scaling. Next, we present our main results that this tradeoff is not fundamental, and we can simultaneously achieve optimal scaling for security, storage, and throughput.

4 Main Results

We focus on a general class of state transition functions that are multivariate polynomials of maximum degree . For instance, updating the balance of a bank account is a linear function of the current balance and the incoming deposit/withdrawal. Moreover, when we represent the variables in a state machine as bit streams, we can model any function as a polynomial using the following result [52]: any Boolean function can be represented by a polynomial of degree with at most terms. See Appendix A for an explicit construction of this polynomial.

Theorem 1 (synchronous setting).

Consider a system of compute nodes connected by synchronous networks, up to (for some constant ) fraction of which are subject to authenticated Byzantine faults. Over this system, there exist computation schemes that support operating up to state machines with state transition function of degree , and simultaneously achieve

(2)
(3)

Additionally, for broadcast networks (i.e., malicious nodes cannot send different messages to different nodes), we can also simultaneously achieve

(4)
Theorem 2 (partially synchronous setting).

Consider a system of compute nodes connected by partially synchronous networks, up to (for some constant ) fraction of which are subject to authenticated Byzantine faults. Over this system, there exist computation schemes that support operating up to state machines with state transition function of degree , and simultaneously achieve

(5)
(6)

We prove the storage and security scaling results in Theorem 1, and Theorem 2 in Section 5. In particular, we present a coded computation scheme named “Coded State Machine” (CSM), which simultaneously achieves storage and security scaling with the network size. In Section 6, we complete the proof of Theorem 1 by developing an information-theoretically verifiable matrix-vector multiplication algorithm (abbreviated as INTERMIX), which significantly slashes the computational complexity of CSM, and helps to achieve throughput scaling.

Remark 1.

The proposed CSM simultaneously achieves the information-theoretically optimal storage efficiency and security to within constant multiplicative gaps, for both synchronous and partially synchronous settings. It also achieves optimal throughput to within a logarithmic factor for synchronous networks.

Remark 2.

In sharp contrast to SMR, each CSM node stores a coded state that is a unique linear combination of the states, whose size is the same as a single state variable. In the execution phase, each node generates a coded input command by linearly combining the incoming commands, and then computes the state transition function directly on the coded state and command. Using these coded computation results from all nodes, a subset of which may be erroneous, each node recovers the output results and the updated states via error-correcting decoding.

Remark 3.

For the special case of linear state transition function (i.e., ), codes designed for distributed storage (see, e.g., [12, 40]) can also be used to achieve similar scaling as CSM. However, CSM is designed for a much more general class of arbitrary multivariate polynomials, which cannot be handled by state-of-the-art storage codes.

5 Description of Coded State Machine

CSM simultaneously achieves optimal scaling of security, storage efficiency, and throughput. It has two key components, including Coded State and Coded Execution. We first briefly discuss these two components, and then describe CSM in detail.

Figure 3: Illustration of the key components of Coded State Machine. (a) Coded State: each node stores a coded state generated by evaluating a polynomial at point , where , for all . (b) Coded Execution: for a polynomial state transition function , each honest node  computes an intermediate result , which can be viewed as evaluating a polynomial with higher degree at point . Given computation results, a subset of which may be erroneous ( in this case), Reed-Solomon decoding is used to recover , which is then evaluated at to obtain the final output results.

Coded State. CSM views a (possibly coded) state variable as evaluating a polynomial at a point. As illustrated in Figure 3(a), having specified using the uncoded states via Lagrange interpolation at distinct points , CSM generates coded states , by evaluating at distinct points , and stores at node . This coding design was recently proposed in [48] for distributed computation of a polynomial over disjoint data.

Coded Execution. In the execution phase, each node  generates a coded command  from the agreed commands as done for the state encoding, and directly feeds and into the state transition function . For any multivariate polynomial , can be viewed as evaluating a composite polynomial of higher degree at the point (see Figure 3(b)). From the intermediate computation results , a subset of which may be erroneous, CSM exploits efficient noisy interpolation techniques like Reed-Solomon decoding to recover , and evaluates it at to obtain the output , for all .

5.1 Coded State

To generate the coded states stored at the nodes, we first pick arbitrarily distinct elements from the field , one for each state machine; and then pick arbitrarily distinct elements from , one for each node.

Given the states at round , we construct the Lagrange interpolation polynomial . Here evaluating at recovers the state .

Then, the coded state stored at node  is generated as evaluating at the point , i.e., for all ,

(7)

where is the coefficient for the state at node .

When is finite, the field size needs to be at least for this state encoding to be feasible. For small field (e.g., binary field), we can overcome this difficulty by using field extension and applying CSM on the extended field (see details in Appendix A).

Since each coded state has the same size as an uncoded state, the CSM scheme has a storage efficiency of .

Remark 4.

The coefficients s in (7) used to generate coded states are independent of both the state transition function , and the round index Therefore, the state encoding of CSM is universally applicable for all types of state transition functions at each round of operations.

5.2 Coded Execution Phase

The consensus phase is performed the same as SMR in Section 3, after which all honest nodes have agreed on the input commands . As the first step in the execution phase, each node uses the same set of coefficients in (7) to compute a coded command , which is the evaluation of a polynomial at the point . Then, node applies the state transition function directly on and its locally stored to obtain , and broadcasts to all other nodes. Given that up to nodes are malicious, up to of the computation results are erroneous.

Input Consensus Decoding Output Delivery
Synchronous
Partially Synchronous
Table 2: Upper bounds on the number of malicious nodes () to achieve consensus on input commands, successful decoding, and secure delivery of output results.

Synchronous network. Each node waits for a fixed interval to receive all the computation results . For a function that is a multivariate polynomial of constant degree , the result can be viewed as the evaluation of a univariate polynomial of degree at . Given such evaluation results from distinct s, and of them might be arbitrarily erroneous, each node can recover the coefficients of the polynomial following the procedure of decoding a Reed-Solomon code with dimension and length (see, e.g., [41]). This decoding can be successful if and only if the # of errors is bounded as . After decoding, each node locally reconstructs a polynomial , and evaluates it at to obtain , for all . We note that the reconstructed polynomials at all honest nodes are identical even when the malicious nodes deliberately send different computation results to different nodes (i.e., in presence of equivocation).

Finally, each node sends the recovered output to the intended client , and updates its local storage to . In Table 2, we summarize the upper bounds on for the system to achieve consensus on input commands, successful decoding, and secure delivery of output results for synchronous networks, among which the one for successful decoding is most effective. Assuming a fraction of the nodes are malicious, i.e., , then this computing system can securely support up to state machines.

Partially synchronous network. In this case, since each of the malicious nodes may refrain from sending any messages, the remaining honest nodes should start decoding upon receiving computation results to ensure liveness. However, since a receiver node cannot distinguish between a missing message held by a faulty node and a delayed message sent by an honest node, a node can proceed to the decoding step with up to of the received results being erroneous. In this case, each node needs to decode a Reed-Solomon code with dimension and length , and successful decoding requires . Assuming a fraction of the nodes are malicious, i.e., , then this system can securely support up to state machines. This completes the proof of Theorem 2.

6 Throughput Scaling via Verifiable Computing

As defined in Section 2, the throughput of CSM is inversely proportional to where , , and respectively represent the computational complexities of encoding and processing the input commands, decoding the state transition functions, and updating the coded states. In this section we describe a protocol that allows for throughput scaling by reducing this overall complexity to quasilinear in . Our protocol comprises of two main components. First, we describe INTERMIX, an efficient tool for information-theoretically verifiable matrix-vector multiplication, which is at the core of all our encoding and decoding operations. INTERMIX consists of a single worker that performs the computation, and a small randomly elected committee of auditors that are in charge of examining the worker’s output. Once an honest auditor detects a fraud, he interactively enforces the worker to produce an inconsistent result that can be checked in constant time by every node in the network. Next, we show that by delegating the entire encoding and decoding operations to a single node in the network, the overall complexity can be reduced to quasilinear in . Computing state transition on the coded state and commands will be done distributedly at each node, similar as before. We will show that the correctness of all the operations performed by this single worker can be checked by every node in the network, by using INTERMIX as a blackbox verification module. A block-diagram of the proposed model has been illustrated in Figure 4. We note that the correctness of the results in this section relies on additional assumptions of synchronous and broadcast (no equivocation) network.

Figure 4: A block-diagram of the proposed centralized computation model for throughput scaling. INTERMIX is used for verifying the correctness of the results.

6.1 INTERMIX, Efficiently Verifiable Matrix-Vector Multiplication

Suppose we have a network of nodes, a matrix and a vector . Node is interested in computing where represents the ’th row of . Our objective is to delegate the entire computation to only one node (the worker), while the remaining nodes verify the correctness of the results. Our approach is to randomly choose nodes to audit the worker, where

is a constant large enough that the probability that none of the auditors is honest is negligible. Each auditor will repeat the computation of

and compares it with the result returned by the worker. An (honest) auditor who detects a fraud in the computation, will interactively enforce the worker to produce an inconsistent result which can be checked in constant time by the remaining nodes in the network. The algorithm has the following components.

Figure 5: Illustration of INTERMIX for verifiable matrix vector multiplication. An honest auditor can interactively enforce the worker to provide an inconsistent result that can be checked in constant time. We assume that the remaining nodes (the commoners) can overhear the entire conversation.
  • [leftmargin=*]

  • Random Committee/Leader Election: A worker and a committee of auditors are chosen randomly. Given that at most a fraction of the nodes in the network are dishonest, we choose auditors at random, in order to ensure with probability , that at least one auditor is honest. An easy way to do this is to allow each node to self-elect with probability . If an adversary decides to conduct a DoS attack by imposing unnecessary audits, he will be banned from the next rounds of the algorithm.

    In a dynamic setting, where banning is ineffective, the local coin-toss algorithm can be replaced by any off-the-shelf distributed random number generating algorithm (such as [45, 39]). Given that we do not need fresh randomness at every round of computation, we only run such an algorithm occasionally to amortize its overhead. The committee can still be updated at every round by relying on a standard pseudorandom number generator.

    We can further hinder a dynamic adversary who wishes to corrupt the auditors after the election process, by keeping the identities of the auditors secret with the help of Verifiable Random Functions (VRF) [14, 35]. Accordingly, an auditor remains anonymous until he decides to conduct an audit, at which point he presents a proof of having been elected.

    Remark 5.

    It is important to note that the auditors are stateless, which makes it possible to change the committee at every round of computation, without imposing significant communication overhead on the network. To make a comparison, random sortition algorithms [25, 36] are more prone to dynamic adversaries, due to the prohibitively large communication overhead associated with frequently updating the allocation of the nodes to the state machines.

  • Computation (Auditors and the Worker): If the worker is honest, he computes and broadcasts this result to the network. If he is dishonest, he broadcasts an arbitrary . Each auditor repeats the computation of .

  • Interactive Localization of the Fraud (Auditors and the Worker): If an (honest) auditor does not detect a fraud in the computation, he will inform the rest of the network (henceforth referred to as the commoners) that the result is correct. Otherwise, he will aim at proving to the commoners that the result returned by the worker is wrong. He will accomplish this task through a set of interactive queries to the worker as follows.

    Note that if , there must exist at least one index such that . If there are multiple such indices, the auditor will choose one such arbitrarily, and will aim at proving to the commoners that , which will be sufficient to prove that . To fulfill this goal, first the auditor breaks into two consecutive chunks of equal size named and . Similarly, the vector is broken into and . The auditor asks the worker to compute as well as . If , the auditor raises an alert to draw the attention of the commoners to this fact. A commoner can simply check that and is convinced of the fraud. On the other hand, if , we must have either or . The auditor proceeds to compute both and in order to locate the fraud. Once this is done, the auditor can focus on the corresponding half of the vector and repeat the algorithm, until an inconsistency is detected. This algorithm is summarized below.

    1:Input: .
    2:Output: One bit indicating whether the auditing process has succeeded or failed. A string that localizes the failure.
    3:if  then
    4:     Return True.
    5:end if
    6:Choose such that . Set .
    7:Set , , , , .
    8:while (do
    9:     Let , , , 1cm.
    10:     Request and from 1cmthe worker. Receive .
    11:     if   then
    12:         Return False.
    13:     else
    14:         Choose such that . Let    and .
    15:         Let , and .
    16:         Increment .
    17:     end if
    18:end while
    19:Return False.
    Algorithm 1 Algorithm run by an honest auditor
  • Verification (Commoners): If all the auditors acknowledge the correctness of the result, the commoners will accept the result provided by the worker. If an auditor returns False in Algorithm 1 within the loop (while ), the commoners will check in constant time whether . If this equality does not hold, the commoners will conclude that the result is incorrect. Finally, if the algorithm returns False in the last stage, the commoners will check in constant time whether holds or not.

Correctness. If the worker is honest, an honest auditor will observe that and acknowledge its correctness. If all the auditors return True, the commoners will accept the result as correct. The action space of a dishonest auditor is quite limited. Firstly, he can impose unnecessary audits on the honest worker despite observing that . Furthermore, he can return False despite detecting no inconsistency in the responses to the audits. By doing so, he can merely worsen the complexity of the system. A commoner can verify that there is no inconsistency in the outputs of the worker in constant time and dismiss the malicious auditor’s alert.

(Information-theoretic) Soundness. Suppose the worker is dishonest, i.e., . Remember that we chose large enough that the probability of having no honest auditor is . Therefore, with probability there will be an honest auditor which will be able to localize the fraud of the worker following the interactive method described in Algorithm 1. Furthermore, relying on the assumptions of broadcast and synchronous network, if a worker chooses to not respond to any auditor, the commoners can detect this violation of the protocol, and decide that the worker is malicious. Note that the soundness of INTERMIX does not rely on any assumption on the computation power of the worker. In other words, INTERMIX is information-theoretically secure, since even a computationally unbounded worker is unable to compromise the soundness of the algorithm.

Overall Computational Complexity of the Verification Scheme. Let us compute the overall complexity under a worst case assumption that all the auditors conduct queries to the worker. One auditor can increase the complexity of the worker and himself by where represents the complexity of computing the inner-product of two vectors of length , and is equal to . As a result, in a worst case scenario, where all the auditors conduct queries, the overall added complexity due to inner-product computation is . Each auditor also performs comparisons between his locally computed results and the results returned by the worker.

A auditor can also increase the complexity of the commoners by returning False in Algorithm 1. In this case, a commoner will conduct one comparison operation to check if the auditor’s alert is valid or not. In a worst case scenario, where all the auditors return False, the overall complexity of the commoners increases by .

Therefore, the worst-case computational complexity of INTERMIX is , where denotes the computational complexity of multiplying by . We observe that as grows large, and unless the matrix has a very simple structure, the overall complexity of INTERMIX is dominated by the complexity of centralized computation of .

6.2 Centralization of Encoding/Decoding for Throughput Scaling

To reduce the coding complexity using the idea of verifiable computing, we must address a fundamental question: if all encoding/decoding operations are performed at a single node, can we achieve a per node complexity that scales sub-linearly with ? To answer this question, let us look at all the coding operations in CSM, i.e., (i) encoding of input commands, (ii) decoding of output results/next states, and (iii) updating coded states. We present computation schemes at a single node that has sub-linear complexity in for each of the above coding operations. We will use INTERMIX as a verification module throughout this section, wherever the need for trusted computation arises.

Encoding of input commands. As we saw in Section 5.2, for the encoding of the input commands, we pick distinct elements , and distinct elements . At each round , we perform a Lagrange polynomial interpolation using the points to construct a polynomial . For each node , the coded command is generated as . We consider computing coded commands of all nodes at a single node which we call the worker. The encoding process proceeds in two steps: 1) polynomial interpolation, 2) multi-point polynomial evaluation.

Step 1: Polynomial interpolation. Given the input commands , the worker performs Lagrange interpolation to find a polynomial of degree that passes through all , i.e., , where is the coefficient for the term . This operation can be done with computational complexity (operation counts) of (see, e.g., [24]).

Step 2: Multi-point polynomial evaluation. Having obtained the coefficients , the worker evaluates at the points to compute the coded inputs . Specifically, , where is the coefficient of . This operation can be done using fast polynomial arithmetic with computational complexity (see, e.g.,  [24, 34]). Hence, the total computational complexity to encode the input commands is . Define . The committee of auditors will use the second equality, i.e., fact that to interactively verify the correctness of the results with the INTERMIX algorithm.

Updating coded states. At the end of round , each node  updates its local coded state to . Centralized state update can be done similarly to encoding the input commands. Here, the worker interpolates the polynomial , as defined in Section 5.1, and evaluates it at . The auditors take advantage of the fact that to help the commoners verify the correctness of the results in constant time, via INTERMIX.

Decoding of the output results/new states. Consider a state transition function of degree . The coded computation result at an honest node  can be viewed as evaluating a polynomial of degree at the point . That is, , where is the coefficient of the term in . After receiving the computation results from all nodes, up to fraction of which are erroneous, a worker node decodes with a computational complexity of (say, using Berlekamp-Welch algorithm).

Having decoded , the worker node is required to broadcast these coefficients to the rest of the network, and then evaluates at the points to recover the output results and the next set of states. That is, the worker node computes

(8)

The computational complexity of this step is .

Two steps are required to verify the correctness of the decoded results. Firstly, the worker needs to prove that his decoding of the coefficients are correct. Secondly, he needs to prove that (8) is computed correctly. This second step can be directly accomplished via INTERMIX applied on the matrix and the vector . We will now describe how INTERMIX can be used for verifiable polynomial interpolation in the presence of errors.

We know from principles of error correction coding, that the polynomial is the correct interpolation among the received points , if and only if there exists a set of size at least such that for all . We will require the worker to broadcast this set together with the decoded coefficients . Let us without loss of generality assume that . It must hold that

(9)

Now, we can apply INTERMIX to verify the correctness of (9), and subsequently verify the correctness of the decoding operation.

6.3 Evaluating the Throughput

We are now ready to characterize the throughput of CSM using INTERMIX, and establish the final claim of Theorem 1. We saw in this section that the complexities of the encoding and decoding operations are reduced to given that the entire computation is delegated to only one node. Furthermore, each intermediate result can be computed locally in constant time given that the polynomial has a fixed degree that does not grow with . Therefore, the overall computational complexity of the CSM is , where the first term indicates the complexity of the auditors and the worker, and the second term indicates the aggregate complexity of the remaining nodes. The throughput of CSM is computed as . This completes the proof of the last statement about throughput scaling in Theorem 1.

7 Discussions

In this section, we discuss engineering considerations and future research directions raised by CSM.

Blockchain Applications. One motivation for working on CSM is that (sharded) blockchain systems are best represented using the state machine formalism. While the current instantiations of blockchains are based on full replication, future proposals are most similar to partial replication [33], both of which make severe security-efficiency tradeoffs [44]. The results of this paper on coded state machines can be used as a stepping stone towards scalable and secure blockchain designs.

Random Allocation vs. CSM. An alternate architecture for scaling security and efficiency simultaneously is to randomly allocate nodes into groups that process distinct state machines. In this method, the fraction of adversaries in the group will be representative of the fraction of adversaries in the entire network. However, a dynamic adversary who observes this grouping can do post-facto corruptions to the small number of nodes in that group, thus making security proportional to group size. One possible solution is to rotate the group allocations periodically [25], but this cannot be very frequent since this will require each node to re-download the data corresponding to different state machines. In CSM we avoid this tradeoff completely, and the full security is guaranteed against a dynamic adversary as well.

Verifiable Computing vs. INTERMIX. In order to scale throughput, there have been several existing proposals to do verifiable computing [21, 7, 38, 6, 5], where one node does computation and other nodes verify the computation in sub-linear time. These methods are only secure under cryptographic assumptions, are not yet practically scalable and potentially require a trusted setup. By contrast, INTERMIX is information-theoretically secure, is practical and does not require a trusted setup. However, compared with some other verifiable computation approaches, the interactive nature of INTERMIX becomes a disadvantage.

Degree Dependence. The proposed CSM is efficient only when the degree of the polynomial is a constant. As a future direction of research, we aim at generalizing the results to the scenario where the state machine can be represented as a low-depth arithmetic circuit. While low-depth circuits can still have high polynomial degrees, their algebraic structures can be potentially exploited to design efficient computation schemes.

Distinct State Machines. Our present approach in CSM assumes that the distinct state machines have the same state transition function, only distinct sequences of inputs. A future direction of research will be to generalize CSM to the case of distinct state transition functions.

References

  • [1] Aguilera, M. K., Janakiraman, R., and Xu, L. Using erasure codes efficiently for storage in a distributed system. In 2005 International Conference on Dependable Systems and Networks (DSN’05) (2005), IEEE, pp. 336–345.
  • [2] Alsberg, P. A., and Day, J. D. A principle for resilient sharing of distributed resources. In Proceedings of the 2nd international conference on Software engineering (1976), IEEE Computer Society Press, pp. 562–570.
  • [3] Balasubramanian, B., and Garg, V. K. Fault tolerance in distributed systems using fused data structures. IEEE transactions on parallel and distributed systems 24, 4 (2013), 701–715.
  • [4] Balasubramanian, B., and Garg, V. K. Fault tolerance in distributed systems using fused state machines. Distributed computing 27, 4 (2014), 287–311.
  • [5] Ben-Sasson, E., Bentov, I., Horesh, Y., and Riabzev, M. Scalable, transparent, and post-quantum secure computational integrity. Cryptol. ePrint Arch., Tech. Rep 46 (2018), 2018.
  • [6] Ben-Sasson, E., Chiesa, A., Tromer, E., and Virza, M. Succinct non-interactive zero knowledge for a von neumann architecture. In USENIX Security Symposium (2014), pp. 781–796.
  • [7] Bitansky, N., Canetti, R., Chiesa, A., and Tromer, E. From extractable collision resistance to succinct non-interactive arguments of knowledge, and back again. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference (2012), ACM, pp. 326–349.
  • [8] Cachin, C., and Tessaro, S. Optimal resilience for erasure-coded byzantine distributed storage. In International Conference on Dependable Systems and Networks (DSN’06) (2006), IEEE, pp. 115–124.
  • [9] Cadambe, V. R., Wang, Z., and Lynch, N. Information-theoretic lower bounds on the storage cost of shared memory emulation. In Proceedings of the 2016 ACM Symposium on Principles of Distributed Computing (2016), ACM, pp. 305–313.
  • [10] Castro, M., and Liskov, B. Practical byzantine fault tolerance and proactive recovery. ACM Transactions on Computer Systems (TOCS) 20, 4 (2002), 398–461.
  • [11] Castro, M., Liskov, B., et al. Practical byzantine fault tolerance. In OSDI (1999), vol. 99, pp. 173–186.
  • [12] Dimakis, A. G., Ramchandran, K., Wu, Y., and Suh, C. A survey on network codes for distributed storage. Proceedings of the IEEE 99, 3 (2011), 476–489.
  • [13] Dobre, D., Karame, G., Li, W., Majuntke, M., Suri, N., and Vukolić, M. Powerstore: proofs of writing for efficient and robust storage. In Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security (2013), ACM, pp. 285–298.
  • [14] Dodis, Y., and Yampolskiy, A. A verifiable random function with short proofs and keys. In International Workshop on Public Key Cryptography (2005), Springer, pp. 416–431.
  • [15] Dolev, D., Dwork, C., and Stockmeyer, L. On the minimal synchronism needed for distributed consensus. Journal of the ACM (JACM) 34, 1 (1987), 77–97.
  • [16] Dutta, S., Cadambe, V., and Grover, P.

    Short-dot: Computing large linear transforms distributedly using coded short dot products.

    In NIPS (2016), pp. 2100–2108.
  • [17] Dwork, C., Lynch, N., and Stockmeyer, L. Consensus in the presence of partial synchrony. Journal of the ACM (JACM) 35, 2 (1988), 288–323.
  • [18] Elkhiyaoui, K., Önen, M., Azraoui, M., and Molva, R. Efficient techniques for publicly verifiable delegation of computation. In Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security (2016), ACM, pp. 119–128.
  • [19] Fiore, D., and Gennaro, R. Publicly verifiable delegation of large polynomials and matrix computations, with applications. In Proceedings of the 2012 ACM conference on Computer and communications security (2012), ACM, pp. 501–512.
  • [20] Garg, V. K. Implementing fault-tolerant services using state machines: Beyond replication. In International Symposium on Distributed Computing (2010), Springer, pp. 450–464.
  • [21] Gennaro, R., Gentry, C., and Parno, B. Non-interactive verifiable computing: Outsourcing computation to untrusted workers. In Annual Cryptology Conference (2010), Springer, pp. 465–482.
  • [22] Hartmanis, J. Algebraic Structure Theory of Sequential Machines (Prentice-Hall International Series in Applied Mathematics). Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1966.
  • [23] Hendricks, J., Ganger, G. R., and Reiter, M. K. Low-overhead byzantine fault-tolerant storage. In ACM SIGOPS Operating Systems Review (2007), vol. 41, ACM, pp. 73–86.
  • [24] Kedlaya, K. S., and Umans, C. Fast polynomial factorization and modular composition. SIAM Journal on Computing 40, 6 (2011), 1767–1802.
  • [25] Kokoris-Kogias, E., Jovanovic, P., Gasser, L., Gailly, N., and Ford, B. Omniledger: A secure, scale-out, decentralized ledger. IACR Cryptology ePrint Archive 2017 (2017), 406.
  • [26] Lamport, L. The implementation of reliable distributed multiprocess systems. Computer networks 2, 2 (1978), 95–114.
  • [27] Lamport, L. The part-time parliament. ACM Transactions on Computer Systems (TOCS) 16, 2 (1998), 133–169.
  • [28] Lamport, L., Shostak, R., and Pease, M. The byzantine generals problem. ACM Transactions on Programming Languages and Systems (TOPLAS) 4, 3 (1982), 382–401.
  • [29] Lee, D., and Yannakakis, M. Closed partition lattice and machine decomposition. IEEE Transactions on Computers 51, 2 (2002), 216–228.
  • [30] Lee, K., Lam, M., Pedarsani, R., Papailiopoulos, D., and Ramchandran, K.

    Speeding up distributed machine learning using codes.

    IEEE Transactions on Information Theory 64, 3 (2018), 1514–1529.
  • [31] Li, S., Maddah-Ali, M. A., and Avestimehr, A. S. A unified coding framework for distributed computing with straggling servers. IEEE Workshop on Network Coding and Applications (Sept. 2016).
  • [32] Li, S., Maddah-Ali, M. A., Yu, Q., and Avestimehr, A. S. A fundamental tradeoff between computation and communication in distributed computing. IEEE Transactions on Information Theory 64, 1 (Jan. 2018).
  • [33] Luu, L., Narayanan, V., Zheng, C., Baweja, K., Gilbert, S., and Saxena, P. A secure sharding protocol for open blockchains. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (2016), ACM, pp. 17–30.
  • [34] MacWilliams, F. J., and Sloane, N. J. A. The theory of error-correcting codes. Elsevier, 1977.
  • [35] Micali, S., Rabin, M., and Vadhan, S. Verifiable random functions. In 40th Annual Symposium on Foundations of Computer Science (Cat. No. 99CB37039) (1999), IEEE, pp. 120–130.
  • [36] Naor, M., and Wieder, U. A simple fault tolerant distributed hash table. In International Workshop on Peer-to-Peer Systems (2003), Springer, pp. 88–97.
  • [37] Oki, B. M., and Liskov, B. H. Viewstamped replication: A new primary copy method to support highly-available distributed systems. In Proceedings of the seventh annual ACM Symposium on Principles of distributed computing (1988), ACM, pp. 8–17.
  • [38] Parno, B., Howell, J., Gentry, C., and Raykova, M. Pinocchio: Nearly practical verifiable computation. Communications of the ACM 59, 2 (2016), 103–112.
  • [39] Popov, S. On a decentralized trustless pseudo-random number generation algorithm. Journal of Mathematical Cryptology 11, 1 (2017), 37–43.
  • [40] Rashmi, K. V., Shah, N. B., and Kumar, P. V. Optimal exact-regenerating codes for distributed storage at the msr and mbr points via a product-matrix construction. IEEE Transactions on Information Theory 57, 8 (2011), 5227–5239.
  • [41] Roth, R. Introduction to coding theory. Cambridge University Press, 2006.
  • [42] Sahraei, S., and Avestimehr, A. S. Interpol: Information theoretically verifiable polynomial evaluation. arXiv preprint arXiv:1901.03379 (2019).
  • [43] Schneider, F. B. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Computing Surveys (CSUR) 22, 4 (1990), 299–319.
  • [44] Skidanov, A. Unsolved problems in sharding. https://medium.com/nearprotocol/unsolved-problems-in-blockchain-sharding-2327d6517f43.
  • [45] Syta, E., Jovanovic, P., Kogias, E. K., Gailly, N., Gasser, L., Khoffi, I., Fischer, M. J., and Ford, B. Scalable bias-resistant distributed randomness. In IEEE Symposium on Security and Privacy (SP) (2017), IEEE, pp. 444–460.
  • [46] Tandon, R., Lei, Q., Dimakis, A. G., and Karampatziakis, N. Gradient coding: Avoiding stragglers in distributed learning. In Proceedings of the 34th International Conference on Machine Learning (Aug. 2017), pp. 3368–3376.
  • [47] Wang, Z., and Cadambe, V. R. Multi-version coding—an information-theoretic perspective of consistent distributed storage. IEEE Transactions on Information Theory 64, 6 (2018), 4540–4561.
  • [48] Yu, Q., Li, S., Raviv, N., Kalan, S. M. M., Soltanolkotabi, M., and Avestimehr, A. S. Lagrange coded computing: Optimal design for resiliency, security, and privacy. In NIPS Systems for ML Workshop (2018).
  • [49] Yu, Q., Maddah-Ali, M. A., and Avestimehr, A. S. Polynomial codes: an optimal design for high-dimensional coded matrix multiplication. In NIPS (2017), pp. 4406–4416.
  • [50] Zhang, X., Jiang, T., Li, K.-C., Castiglione, A., and Chen, X. New publicly verifiable computation for batch matrix multiplication. Information Sciences (2017).
  • [51] Zhang, Y., and Blanton, M. Efficient secure and verifiable outsourcing of matrix multiplications. In International Conference on Information Security (2014), Springer, pp. 158–178.
  • [52] Zou, Y. M. Representing boolean functions using polynomials: more can offer less. In

    International Symposium on Neural Networks

    (2011), Springer, pp. 290–296.

Appendix A Field Extension for General Boolean Functions

Using the construction of [52, Theorem 2], we can represent any arbitrary Boolean function whose inputs are binary variables as a multivariate polynomial of degree as follows. For each vector , we define , where if , and if . Next, we partition into two disjoint subsets and as follows.

(10)