Wendy, the Good Little Fairness Widget

07/16/2020 ∙ by Klaus Kursawe, et al. ∙ 0

The advent of decentralized trading markets introduces a number of new challenges for consensus protocols. In addition to the `usual' attacks – a subset of the validators trying to prevent disagreement – there is now the possibility of financial fraud, which can abuse properties not normally considered critical in consensus protocols. We investigate the issues of attackers manipulating or exploiting the order in which transactions are scheduled in the blockchain. More concretely, we look into relative order fairness, i.e., ways we can assure that the relative order of transactions is fair. We show that one of the more intuitive definitions of fairness is impossible to achieve. We then present Wendy, a group of low overhead protocols that can implement different concepts of fairness. Wendy acts as an additional widget for an existing blockchain, and is largely agnostic to the underlying blockchain and its security assumptions. Furthermore, it is possible to apply a the protocol only for a subset of the transactions, and thus run several independent fair markets on the same chain.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The advent of decentralized trading markets introduces a number of new challenges for consensus protocols [12, 13]. Classically, consensus layer protocols only are required to maintain consistency of the blockchain. While additional requirements have been investigated in the past – for example causal order or censorship resilience – very little attention has been given to the fairness of the order of events, making it possible to execute frontrunning or rushing attacks. While some blockchains attempt to make such attacks harder, for example by using a randomized leader election protocol, others can be easily manipulated by a single corrupt validator or a well targeted denial of service attack. In addition to allowing questionable behavior, this can also be a potential regulatory issue, if exchange are required to prevent some levels of fraud.

In this paper, we investigate the issues of attackers manipulating or exploiting the order in which transactions are scheduled in the blockchain. More concretely, we look int relative order fairness, i.e., ways we can assure that the relative order of transactions is fair. We show that one of the more intuitive definitions of fairness is impossible to achieve, and present several alternatives.

Our approach integrates with existing blockchains without any change or non-standard assumption on the blockchain implementation – the only requirement is that there is some known set of parties (resp. validators) through which fairness is defined. This allows us to combine several variations of fairness with different blockchains, have different degrees of fairness for sets of transactions running on the same blockchain, and even change the configuration on the fly without needing to break the chain. This setup can also come especially handy if one wants to formally verify the protocols – it is vital here to have the small, independently verifiable components and to not need to formally verify dozens of variations of the same protocol (a glimpse at the difficulty of formally verifying consensus protocols can be found in  [21]).

2 Model and Architecture

We assume a two-fold model. For one, there is an underlying blockchain that takes blocks as an input and produces a distributed ledger of blocks. For the purpose of the fairness widget, we do not require any assumptions on the blockchain regarding participants, timing, or finality. What we do require is that the blockchain has some form of validity function that evaluates if a block is valid, and that can include the validity conditiond for the fairness widget. We also assume that all validators that can propose new blocks for the blockchain that include fairness-relevant transactions are known and can receive a broadcast from the validators participating in the pre-protocol.

There is no requirement that all blocks in the blockchain are subjected to realtive fairness. If, for example, Ethereum was the underlying blockchain, the fairness protocol could be required for all transactions touching some specific smart contract, without putting any requirements onto other transactions. To this end, all transactions that reuqire fairness contain a fairness-label, and only transactions with the same label need to be fair with respect to each other. Similarily, not all validators need to participate - it is possible that only a subset of the validators propose fairness relevant blocks, though this would slow down the fair transactions.

The fairness preprotocol itselfs requires a more strict model. Our model extends the system model and definitions of Cachin, Kursawe, Petzold, and Shoup [8]. Thus, we assume that the number of byzantine corrupted parties is just less than a third of all parties (i.e., ), though we will also show how to expand the approach to a more flexible model [19]. These could be a subset of the validators of the underlying blockchain, or a completely independent set of parties. We work in the fully asynchronous model, i.e., we assume that an attacker has complete control over the time and order of message delivery, but is not allowed to completely drop a message. Furthermore, we assume that messages are authenticated, and that all participants can sign messages as well as verify each other’s signatures. In addition to the classical byzantine nodes, we also assume rogue traders might try to game the system to get an unfair advantage, especially to get ahead on performing a transaction. These traders can collaborate with any amount of other traders as well up to a third of the validators; in fact, formally we assume that all traders are under control by the adversary.

The validators receive external requests from the traders. We make no assumption on the timing or the order, which is under complete control from the adversary. The blockchain protocol then delivers the requests, i.e., puts them into a block while satisfying the basic properties of atomic broadcast. In practice, to optimize bandwidth, the protocol would likely not use the requests themselves, but hashes thereof. For the sake of presentation, we will use the term request even when a hash would be sufficient. Messages are send by a simple multicast with no requirements on consistency or safety. While there might be some room for optimization if intelligent gossiping protocols are used, our only requirement for the communication layer is that messages between honest parties eventually arrive. An alternative model in the literature is the GST model going back to Dwork, Lynch, and Stockmeier [15], which in some interpretations does allow for some message loss. In this model, the adversary is allowed to arbitrarily delay or drop messages until a time called the global stabilization time, after which she needs to deliver all messages within a known timeout. In this model, protocols essentially try to not violate safety before GST, and then assure liveness after. While we don’t model our protocols in this setting, they work in it just as well as long as lost messages are resend.

As mentioned above, the goal of our design is not to build a new blockchain that includes fairness, but to build a module that can be added to existing blockchain designs. To this end, we provide a pre-protocol that is run by the validators in parallel to the actual blockchain. The pre-protocol outputs valid blocks that assure relative order fairness. While these blocks can be generated by every validator, in most consensus implementations, blocks are proposed by only one or very few parties. To this end, we define a set of designated leader(s) which execute the part of the protocol that generates blocks. The leader part does not involve any communication though, and thus could be executed by every participant without additional communication effort. In addition, we need to modify the block-validity function – proposed blocks are not valid unless it is also verified that the fairness conditions have been satisfied. To be able to use more established formal definitions, we assume that our protocol communicates with an atomic broadcast subprotocol; for all practical purposes, this is equivalent to a blockchain in our context. We make no assumptions on how the underlying atomic broadcast protocol is implemented, and what – if any – timing assumptions it uses. In fact, our preprotocol can work in a completely different model than the underlying blockchain – while our model has a voting/quorum based approach in mind, the blocks generated by the fairness pre-protocol can es well be processed by a Nakamoto style implementation such as Ethereum or Ouroboros, not unlike the approach that Casper is taking to add finality [7]. We do, however, assume that the participants in the fairness protocol know and recognize each other. While it thus would be logical to assume the same for the blockchain protocol, this is not strictly necessary – it is possible to use the fairness protocols presented here to add relative order fairness to (some) bitcoin transactions, as long as it is possible to enforce our new validity condition for that chain and assure the the underlying blockchain only accepts the blocks we generate in the order we generated them.

As we envision a blockchain that handles a diversity of transactions, relative order fairness only needs to be assured for subsets - it is not necessarily required that a request related to technology a stock market is treated fairly with respect to a request related to crop prices in Australia. Thus, every transaction has a market-identifier , and only transactions that have the same market-identifier need to be fair with respect to each other. As we provide different fairness models, it is also possible to use different fairness pre-protocols for different markets. There is even a possibility that a single request has several market identifiers and thus is delivered in a relatively fair way with respect to several, otherwise independent markets. The main issue with this model is that it adds quite some complexity if we want to have different fairness protocols for different markets. While there is no fundamental issue with this, we do not include this property for our protocols in this paper for the sake of (relative) simplicity.

2.1 Related Work

The only work we are aware of that looks at relative fairness is parallel work from Kelkhar, Zhang, Goldfeder, and Jules [17]. They also identify the impossibility of strict fairness and resolve to address block fairness. While our approach is to weaken the fairness condition to circumvent the impossibility of block fairness, they define a concept of weak liveness wile maintaining the stronger fairness condition to this end, and define a set of protocols (both synchronous and asynchronous) to provide relative- or order block fairness. The price for the stronger fairness is that there is no limit on when requests are delivered or how big a block becomes, though the protocols could easily be adapted to one of our models. Their approach also differs in the architecture - while we aim to have a module to be combined with existing atomic broadcast protocols, their work presents a full protocol for .

The concept of causality in state machine replication was first introduced by Birman and Reiter [24], with the example of preventing stock trading fraud. The definition was later refined by Cachin, Kursawe, Petzold, and Shoup [8], and again by Duan, Reiter, and Zhang [14]. While the details in the definitions do matter for meaningful proofs and avoiding less straightforward attacks, the basic idea of these definitions is the same; a message is processed by the protocols in a way that its position in the ordering is fixed before any participant learns of its content. While this is sufficient to prevent some financial fraud – especially if we also allow the sender of a request to remain anonymous until the transaction is scheduled – the protection offered by commit and reveal is not sufficient. Especially in cases of high volatility, traders can still get an advantage if they can schedule transactions faster than their peers.

The notion of fairness has been used in different contexts in the literature. In the context of block delivery, the concept was formally introduced in [8], though some extend of fairness is already provided by earlier protocols such as Castro and Liskovs BFT protocol [11]. In this definition, fairness essentially requires that a blockchain is fair if the time between honest parties being aware of a request and that request being delivered is bounded. This concept is somewhat similar (and sometimes used as a synonym) to censorship resilience [23], though that term as well has now taken on a multitude of meanings in the literature, and usually does not rule out an unfair delay in delivering a request. In terms of relative order fairness, fair protocols at least give an upper bound on the level of unfairness – while it is possible that requests are processed in a different order than they arrived, the number of requests that can rush ahead of a particular request is limited. In [22], a different fairness definition is defined – here, fairness requires that all validators get an equal opportunity to get their transactions into the blockchain. This is a different model than we assume, as we want to achieve fairness for transactrions comming from external participants, while this protocol assures fairness between the validators. There is some relation though, as fairness between validators assures that the dishonest validators cannot dominate the blockchain, and thus requests seen by all honest validators are processed somewhat fast.

The proof-of-work model has a different approach to fairness. Essentially, if the majority of miners are honest, and the number of transactions is smaller than the maximum the network can handle, the probability that some winning miner will process a given transaction soon is relatively high (though there is no strict upper bound). This effect is diluted by an economic argument though – if (as the case in Ethereum and Bitcoin) it is possible to pay miners for preferred treatment, the delay until a particular request is delivered can become fairly high. In terms of relative fairness, this feature makes the blockchains unfair by design – it is explicitly build in that clients who pay more can get preferred treatment.

Some of the more recent protocols [6, 1] frequently exchange the leader even in the absence of observable misbehavior. This makes it harder for an attacker to impose controlled unfairness, as it is harder to assure a corrupted validator is in charge of scheduling when the adversary needs it, though it might be possible to remove the honest leader with a limited denial of service attack. An additional countermeasure is to choose the next leader randomly, decreasing another level of control of the adversary. Fully randomized protocols [8, 23] also make it harder for an attacker to control the level of unfairness. Nevertheless, an attacker can still cause unfairness to a large extent, and – while the unfairness is harder to control – the protocols are not necessarily relative fair, i.e., preserving the order in which requests come when delivering them.

3 Relative Fairness

The term fairness has found numerous definitions in the atomic broadcast and blockchain literature. The most relevant definition of (absolute) fairness for our context requires one of the following:

  • every request eventually gets scheduled

  • every request gets scheduled within a bounded time or number of implementation related messages

Additional constraints depend on the model used, e.g., requests only need to be scheduled within a bounded time after GST (Global Stabilisation Time) [15].

For many consensus protocols, satisfying this definition of fairness does not come naturally. Especially for leader-based protocols, a leader can easily suppress a message. There are a number of countermeasures against this. In [11, 20], replicas watch a leader and dispose of them if they are dishonest; other protocols[1, 2] change the leader frequently, in the hope that an honest leader will eventually handle all outstanding requests. With the exception of [20], no protocol can give strong bounds on when a message is actually scheduled – the time until a message gets scheduled depends on the accuracy of the timing assumptions (or the arrival of GST) and is thus dependent on an out-of-protocol factor. Leaderless protocols [8, 23] tend to have better implicit fairness protection; while they tend be a little slower than leader based ones (at least in a well-behaved network), the decreased effort to assure fairness can give those protocols an edge in a trading blockchain.

As we are anyhow sorting transactions into blocks (this comes rather natural for a blockchain), though it is possible to use logical blocks that encompass several blockchain blocks. In addition to relative fairness, this also assures fairness as defined above. The pre-protocol each party would follow looks as follows (unoptimized version, basing on a leader based atomic broadcast protocol for simplicity):

Definition 1 (Block Fairness).

After a request has been seen by honest parties, it will be scheduled in the next block; if it hasn’t been seen by at least one honest party, it will not be scheduled in the next block.

This is relatively easy to implement – before the ordering protocol starts, every validator sends around a list of all requests they have seen; a valid proposal for a block then consists of the transactions out of of these sets that got votes.

In the setting we envision for our blockchain, even the stronger definition of fairness is insufficient. In addition to the requirements of absolute fainress, we also want relative fairness, which more captures the intuitive meaning of the word – if one request is send before another request, it would be fair if it is also scheduled first.

Definition 2 (Relative Fairness).

A byzantine fault tolerant total ordering protocol is called relatively fair if the following holds: If all honest parties receive request before request , then is delivered before .

Unfortunately, we can show that this definition of fairness is not only impossible, but inherently contradictory even if only one party is corrupt.

Proof (sketch). Suppose we have parties , …, , and requests , …, . Then let get the requests in the order . Now for every , the only party that sees before is party ; all other parties see before ; also, is the only party that sees before .

If all parties are honest, then there is no dedicated message order – no two requests will have been seen in the same order by all honest parties. However, if any party is dishonest, then must be scheduled after , as is the only party to see before (if is dishonest, must be scheduled before ).

As the honest parties following the protocol do not know who is dishonest, the outcome of the ordering protocol must be correct independently of which party is dishonest. Thus, for all , must be scheduled before as well as before , which is a contradiction.  

One way out would be to only require and to be in the same block. However, even that might not be possible, and there is another weakness in this definition: The corrupt parties might see long before any honest party would see , thus our protocol essentially can’t schedule anything seen by parties only; it seems hardly fair if validators cannot get a message scheduled that every client can schedule. We leave it to further work to find further definitions for relative fairness that are efficiently achievable and might serve some usecases better.

Definition 3 (Relative Fairness, 2. attempt).

A byzantine fault tolerant total ordering protocol is called relatively fair if the following holds: If all honest parties receive request before request , then is delivered in the same block as or earlier.

Unfortunately, we can show that this is also impossible:

Proof (sketch). In above proof, we have shown that there exists a schedule in which the required order of messages depends on which party is faulty, thus requiring to take into account a parameter that is not known to an honest party. In this proof, we build on that construct to design a schedule that would create a block of unlimited size.

For this outline, we assume and . Consider two schedules as used above, i.e.,  
: , , ,
: , , ,
: , , ,
: , , ,

and


: , , ,
: , , ,
: , , ,
: , , ,

Both schedules area split into three segments as shown below:

We now link those two schedules to one combined schedule with the segment order , , , , , .

By the design of schedules A and B, to achieve fairness, , , , and must be in the same block. The same holds for , , , and . The argument for this is equivalent to the previous proof; as it is not known to the honest parties who is honest and who not, the requirement could imply that has been seen by all honest parties before (if is corrupt), before , before , and before . Thus, all those messages need to be scheduled in the same block.

In the combined schedule, we also have all honest parties see before . Thus, must be scheduled in the same or an earlier block than . Similarly, needs to be in the same or an earlier block than . As and and respectively and must be in the same block, this means all messages have to be scheduled in the same block.

If we combine the segments the other way around, i.e., , , , , , , we get the same result: is seen by all parties before , and is seen by all parties before , meaning that still both segments need to be in the same block.

We can now repeat this construction. Suppose we have segment in the same structure as segment , and segment in the same structure as segment . Then consider the schedule

, , ,, , , , ,

By above argument, all messages in and need to be in the same block; the addition of the messages from segment does not affect the argument. Similarily, all messages in and need to be in the same block; this is unaffected by . In the same way, we can add in a way that it needs to be in the same segment as :

, , ,, , , , , , ,,

This construction can be arbitrarily repeated, leading to an infinite sequence of messages that all need to be in the same block.  

A notable property of our result is that we do not need a corrupted party to actually act in any bad way – it is enough that there is some party that has the label ’corrupt’, and noone knows which one it is. While we haven’t worked out the proof, it is likely even impossible if we only require fairness if noone actually is corrupt. To assure liveness in an asynchronous system, the protocol still needs to progress on inputs, which means it misses some information that might be relevant to define a valid order. We did at this point not investigate further, as we prefer to have a protocol that offers somewhat weaker fairness, but maintains robustness in the face of a byzantine adversary.

There are subtle differences in the underlying model that impact what the construction actually means. In some models – essentially the cryptographically sound ones that assume a polynomial time bound adversary [8, 17]- one assumes that the number of incoming (and adversary generated) requests is somehow bounded, i.e., at some point the protocol terminates for good. In this model, our construction does not strictly violate liveness – what happens is that, to satisfy fairness, all requests will be delivered in the one and only block the protocol ever schedules just prior to termination. For those models, we do not prove impossibility of relative block fairness, but impossibility of any meaningful efficiency guarantees – in the worst case, relative fairness is reached by treating all parties equally bad. If we assume a model that allows for infinite protocol runs, the last point in time does not exist, and a protocol cannot guarantee to deliver anything.

The other interesting modeling aspect is the amount of asynchrony required. In the schedule above, once we start interleaving the D-blocks, all messages in the A-block have been seen by all honest parties. This implies that we do not need a fully asynchronous system. For a consensus between parties, if is the time interval between the first honest party becoming aware of a request and the last honest party doing so, then the adversary needs to show honest parties less than other requests during . Thus, our construction is also possible in most synchronous systems, as long as the adversary can generate/access sufficient requests in the given time-span and has the power to freely determine a schedule in which an honest party sees any set of consecutive requests.

Thus, if we bound the number of requests the adversary is allowed to show to honest parties in between the times when the first honest party saw a particular request and the last honest party saw it, the impossibility result still holds.

Theorem 1.

There exists a schedule such that, to achieve relative block fairness, all requests any honest party ever seen need to be scheduled in the same block. Consequently, no block can be delivered with this schedule while new requests can be generated.

Furthermore, once an honest party has seen a request r, the schedule requires less than other requests to be operated on until the last honest party sees . Thus, an infinite schedule can also be generated in a partialy synchronous model.

4 Circumventing the impossibility

We first show a protocol that can guarantee fairness, but does not overcome the liveness issues mentioned above, i.e., it is possible for an adversary to prevent termination. For the ease of description, we describe a somewhat wasteful version of the protocol which resends all requests that did not make it into a block for the next block; in a real implementation, this would be handled in a more efficient way. Also, the protocol as described is sending a lot of signatures repeatedly; that, too, can be optimized in an implementation version.

We describe our protocol as a pre-protocol to the atomic broadcast. The pre-protocol generates a proposal for a block that can then be proposed as the next block for the atomic broadcast protocol, alongside validation information that allow verifying that the block was properly generated. To this end, we assume an atomic broadcast protocol following the definition of [8]. In addition to needing an external validity property, i.e., there is a validation function such that an honest party only accepts an output with added validation information if the verification function holds. be one party, or every party intending to construct a valid proposal. For simplicity, we also assume that the protocol is re-invoked upon termination by the atomic broadcast protocol, and that the framework assures that messages linked to undelivered requests are replayed to the next incarnation of the pre-protocol in the same order, and messages linked to delivered requests are ignored. The reason to structure the protocol this way (rather than having an infinite loop that invokes the atomic broadcast protocol and taking care of messages itself) lies in the modular architecture we want to allow - the fairness pre-protocol is an optional add-on to the atomic broadcast, and thus should be a pre-protocol invoked by the atomic broadcast rather than the other way around, and it must be possible for one atomic broadcast protocol to use different pre-protocols for different markets.

One issue with this approach is that fairness in the traditional sense – if every instance of the pre-protocol terminates, then every request that is seen by all honest party also is delivered (preferably in a bounded time) in some block – is no longer a property of the pre-protocol, but of the combination. This can however easily be derived from relative fairness if we show that every terminated instance of the pre-protocol delivers a non-empty block:

  • By assumption, messages that have not been delivered are treated by the next incarnation of the pre-protocol as if they arrived at the same time in the same order

  • The protocol guarantees progress, i.e., at least one request is delivered into a block on each terminating incarnation

  • By the relative fairness requirement, for every request that has been seen by all honest parties, there is a finite number of requests that can be scheduled in an earlier block.

.

We say that a request blocks another request given the current information, it cannot be excluded that needs to be in the same or an earlier block to achieve relative block fairness. More precisely, blocks if and share a market-identifier, and it is not the case that parties

  • have reported to have seen before , i.e., assigned it a lower sequence number, or

  • have reported to have seen and all requests with a lower sequence number, but not .

Lemma 2.

If does not block , then is not required to be in the same or an earlier block than by the requirements of relative block fairness.

Proof. To be required to be in the same or an earlier block, all honest parties need to have seen before . If parties report to have seen after , at least one of them is honest, and thus not all honest parties have seen before .  

Widget Neverending Wendy for block and protocol instance
All parties:

  • let be the counter of incoming requests, starting at 0.

  • while no valid proposal has been seen as the proposal for atomic broadcast for block do

    • for all known and unscheduled request , in the order of the receiving the requests, send the signed message (,,,) to all parties, where is the sequence number of that request.

  • end while

Additional protocol for the leader(s):

  • wait until the first request is contained in the signed and valid votes from parties; add to

  • while any request blocks any other request ,

    • if request has at least votes, add to

  • end while

  • The proposal for the next block of the atomic broadcast is , validated by all signed votes for requests in .

The following defines how a valid vote and block look like:

Definition 4 (Vote-Validity).

A vote is valid if it has the proper format, and once all requests with a lower sequence number from that voter have been received.

Definition 5 (Block-Validity).

A block is valid if it contains a nonempty set of requests with valid votes each; a vote for is valid if it contains the signed votes for all requests for that block with a lower sequence number. Furthermore, for every in , if there is a request in the vote validation that had at least votes with a lower sequence number than , then needs to be in accompanied by validation votes.

Theorem 3.

The protocol Neverending Fairness guarantees safety, i.e., if a block is sent to the atomic broadcast protocol, and there are requests and such that all honest parties have seen before , then is in the same or an earlier block than .

Proof. If the leader is honest, it will place at least one request in . By the protocol logic, will be delivered once no request not in blocks any request in .

As the validity proof contains all the history that lead to the definition of the block, every valid block has to satisfy the conditions for relative block fairness. If the leader is dishonest, the only misbehavior (apart from deliberately not terminating the pre-protocol) is to suggest different valid blocks to different parties. This, however, is easily caught by the atomic broadcast protocol. Other dishonest parties can report different orders to different leaders (if those exist). This also is caught by the atomic broadcast protocol (which in this case should select one of those blocks as the next one), as well as requiring contradictory signatures that are then provable exposing the corrupt party.

Theorem 4.

If some honest party submits a request , the protocol Neverending Fairness terminates.

Proof. Void, because the theorem is wrong.  

As we have shown in the previous section, it is possible for an adversary to construct a schedule in which an arbitrary amount of messages needs to be put into the same block; thus, an adversary with sufficient influence on message ordering can keep the protocol process one block forever.

Consequently, we also cannot quantify the absolute fairness – once a request is seen by all honest parties, there is no upper bound on when it is delivered. The only statement we can make is about the block it will be contained in (which depends on the number of undelivered earlier requests), but not on the time or communication effort until that block is delivered.

Lemma 5.

If does not block , then is not required to be in the same or an earlier block than by the requirements of relative block fairness.

Proof. To be required to be in the same or an earlier block, all honest parties need to have seen before . If parties report to have seen after , at least one of them is honest, and thus not all honest parties have seen before .  

4.1 Armageddon

If the protocol terminates due to lack of usage (i.e., there are no more requests to be scheduled), then the impossibility result no longer holds – in the worst case scenario, the protocol only schedules one block after the genesis block which then contains all transactions (one could argue that such a behavior may hasten the end-of-time scenario as users abandon the system). What is left to show is that all requests that an honest party has seen actually are delivered. This model also assumes that the adversary cannot keep the protocol running forever by generating its own transactions. This would usually be the case as (a) forever is a very long time and a concept that doesn’t exit in a cryptographically strict model, (b) usually transactions cost money to incentivise the validators, so such an adversary would spent an unlimited amount of money to prevent protocol termination.

If the protocol terminates while still in operation due to validators opting out, a weaker form of liveness is required – while the protocol should have created all the blocks it could before, it cannot be expected to deliver every single request in that setting. While we do not quantify which messages can get lost under these conditions, [17] provides the formalism to cleanly define such end-time scenarios.

4.2 Relative Synchrony Assumption

One reason why the impossibility result works is that we allow the adversary to completely control the schedule, i.e., the order in which all parties see all requests. This is an unrealistically strong adversary; it is usually defined that way as it is rather hard to model a realistic worst case network attack. In the following, we define an adversary who is almost that strong, but has a (small) failure probability. For this definition, we assume that there is some form of global time, which is unknown to the individual parties.

Definition 6 (Probabilistic Adversary Failures).

After every time the adversary delivers a message, all undelivered messages between honest parties, in a random order, are each delivered with a probability . If as a result of such a message an honest party generated another message, that message is added to the pool of messages to be delivered with probability at a random position.

While this definition invalidates the impossibility result and allows for an algorithm to achieve relative fairness, we still run into practical issues. If is unknown (analogous to the failure detectors, where it is unknown when a party is rightfully suspected), then we have no known upper bound for the block size and, relatedly, latency. Even if is known, the maximum possible blocksize can be prohibitively large for any practical implementation. In addition, the adversary can improve the schedule shown in the previous section to add more resilience. For example, the adversary could (using twice as many transactions) interweave two such schedules in parallel, and thus tolerate a delivery error in one schedule; to force termination, dilvery errors need to affect both schedules within a short time, which would then happen only with probability . If the adversary has enough messages to operate with, the resilience can thus be arbitrarily high.

While this model is probably pretty close to reality in that a realistic adversry will not have complete control over message delivery for a very long time, it is also unsatisfying in that is extremely hard to determine (and probably not the same for all messages and not independent for each message).Furthermore, a more detailed analysis would have to be made on how an attacker can create even more error- resilient schedules with fewer messages, i.e., how many delivery failures need to co-incide to terminate the protocol. Thus, while we can show termination within this model, more work is required to refine the model to the point that we can also make qualitative statements on expected block sizes and latency.

Note that this definition also adds enough synchrony to allow for deterministic byzantine agreement, as the adversary will (eventually) fail to prevent termination.

It remains an open question how much synchrony (in terms of limited message delay) would be needed to circumvent our impossibility result. While we expect that simply having known timeouts is not sufficient – our construction only requires requests to be seen in a bad order relative to each other, and also works if all parties see a given request within a limited time interval – the exact benefit of various synchrony assumptions are still open work (for some further work on this, see Kelkar,Zhang, Goldfeder, and Juels [17]).

4.3 Probabilistic Relative Block Fairness

Definition 7 (Probabilistic Relative Block Fairness).

A byzantine fault tolerant total ordering protocol is called probabilistically relatively block fair if the following holds: There is a fixed probability such that, if all honest parties receive request before request , then is delivered in the same block as or earlier for with at least probability .

This definition allows a protocol to at some point stop assuring fairness and put the already processed messages into the next block, even if that means that some messages are scheduled unfairly. To achieve termination at sacrificing some level of fairness, we can set a threshold and artificially terminate the protocol once the number of requests in exceeds . This means that an adversary with sufficient network control can cause a limited amount of unfairness (i.e. scheduling some requests out of a fair order), however, the majority of all requests will be scheduled fairly, and causing an unfair order does require a very high level of network control for the adversary. Of course, the cut-off point can also be defined using other factors, e.g., a timeout, the number of requests in the queue, etc.

We can strengthen this approach by adding a random factor. In that setting, once is exceeded, we use a common coin [9] to determine when the protocol stops. This could be done in a way that the result is unpredictable even for the leader - after each request added to beyond , the leader can request a coin from all other parties defining whether or not she should stop at that point. Thus while an adversary with extensive network control can cause an unfair scheduling, she has no influence on who is treated unfairly. Communication overhead can be managed by piggybacking the coin shares to the voting messages; furthermore, as the attacker gains little apart from a small slowdown of the protocol, one could hope that most economic attackers would not attempt such an attack, and thus in most cases the protocol terminates before reaching . While this allows the timing model to remain unchanged, the required maximum blocksize is linked to ; if is to be very small (e.g., one in a million), the number of messages per block that the protocol needs to be capable of handling is correspondingly high.

4.4 Fairness using Local Clocks

We now present a different definition of fairness that is slightly weaker, but that allows for much stronger liveness guarantees.

Definition 8 (Timed Relative Fairness).

Suppose that all parties have access to a local clock. If there is a time such that all honest parties saw (according to their local clock) request before and request after , then must be scheduled before .

Note that there is no need for the local clocks to be synchronized at all; the only formal requirement is that the clock always counts forward and that no two timestamps are the same. Obviously, the definition does make more practical sense if the clocks are roughly in sync. Using GPS as a time source with a hardening layer to prevent GPS spoofing (e.g.,  [5]) and robust syncronisation protocols [4] should be more than sufficient to make this approach practical.

For our protocol, it is sufficient to assure that if needs to be scheduled before , is in an earlier or the same block. As the timestamps are included in the block, the ordering of requests inside a block can be performed locally after the block is delivered.

Widget Clocked-Wendy for block and protocol instance ID
All parties:

  • let be a counter for incoming requests, starting at 0

  • while no valid proposal has been seen as the proposal for atomic broadcast for block do

    • for all known and unscheduled requests , in the order of the timestamps on the requests, send the message (,,, timestamp(),) to all parties, where is the sequence number of that request.

  • end while

Additional protocol for the leader(s):

  • wait until the first requests is contained in the signed list of validators; add to

  • let be the set of requests for which a vote for with a smaller timestamp than was received

    • wait until there is a set of parties from which valid votes for all requests in are received

    for all , if timestamps of votes are smaller for than the median of the timestamp of the votes for , add to

  • The proposal for the next block of the atomic broadcast is , validated be the corresponding signed votes in

Since the fairness condition changed, the validity of a vote and of a block also look different.

Definition 9 (Timestamped Vote-Validity).

A vote is valid if it has the proper format, and if the sequence number matches the sequence on timestamps on requests from that party. Once a party mismatches the timestamps and the sequence numbers, i.e., there are two requests and such that has a lower sequence number and a higher timestamp than , this and all following votes from that party are considered invalid. Furthermore, a vote is only considered valid once all requests with a lower sequence number from that voter have been received.

Definition 10 (Timestamped Block-Validity).

A block is valid if it contains a nonempty set of requests with valid votes each; a vote for is valid if it contains the signed votes for all requests for that block with a lower sequence number. Furthermore, for every in , if there is a request in the vote validation that obtained votes with a lower sequence number than , then needs to be in accompanied by validation votes.

Theorem 6.

(Safety) If a request is scheduled in a block , and there is a request such that there is a time in a way that all honest parties saw before and after , then is in or an earlier block.

Proof.

Assume without loss of generality that every timestamp has a unique time. This can easily be assured locally by a high enough time resolution, and by ordering votes by party identifier if two votes have the exact same timestamp.

Suppose at the end of the pre-protocol, we have request and , and that has not been scheduled in an earlier block. Let be the median of the timestamps of .

  1. As , at least parties timestamped before or during

  2. As , at most parties timestamped before .

Suppose by the requirements of timed relative fairness, we have to schedule before . As of the parties that issued votes are honest, this implies that

  1. there exists such that votes contain timestamps for before , and at most votes contain timestamps for before .

By (2), at most timestamps for are smaller than , and by (3) at least are smaller than ; thus, is smaller than . Similarly, for , by (3) at most timestamps are smaller than , by (1) and at least are smaller or equal to . Thus, is smaller than . This is a contradiction, and therefore it is not possible that needs to be scheduled before .  

Theorem 7.

If some honest party sees some request, any honest leader will terminate the protocol with a proposal.

Proof. As every party sends every request it sees for the first time to all other parties, every request that is seen by some honest party is seen – and send to the leader(s) – by all honest parties. Thus, there is some that is in the signed list of parties. Once a leader gets votes for some for the first time, there is a finite number of requests for which the leader received a vote before. As the leader has seen this vote and is honest, it also forwarded the to all other parties, and thus will receive votes eventually. Therefore, the waiting statement always terminates for all requests .  

Note: We only need successful termination if an honest leader exists. All atomic broadcast protocols we are aware of either have a single leader which is replaced if a liveness problem occurs, or use more than parties in a leader-like function simultaneously and thus guarantee that there is some honest leader.

4.5 Optimizations

The two protocols described above can also be combined. The joint protocol would act like the neverending protocol up until ; however, instead of aborting the protocol and allowing for plain unfairness, it switches to the weaker timed definition of fairness once is exceeded. That approach allows for much more aggressive thresholds, as the fallback protocol is no longer unfair, but still fair with a slightly weaker definition.

4.5.1 Latency and performance impact

Introducing any kind of relative fairness always has a latency impact. If no fairness is required, every incoming request can be processed as soon as it arrives. Relative fairness, no matter how it is defined, requires leaders to wait until they can decide if there are other requests with a higher priority. While the mostly fair protocol allows parameterisation of the trade-off between latency and unfairness – the lower the cutoff parameter, the faster the worst case protocol and the easier for an adversary to cause an unfair transaction. However, in the benign case, the latency overhead should be reasonably small.

One (small) speed increase can be reached by parallelizing the leader part of the protocols. Instead of waiting for the first request to add to and then sticking to it, the protocol can be run in parallel for all requests that have been reported by enough parties. In that case, the first instance that terminates its while condition wins and defines the next block. It is also possible to cut the threshold in the neverending fairness protocol to by using a more sophisticated blocking function.

Another parallelization approach would be that the first part of the protocol where all parties broadcast their orders is permanently performed, independently of the state of the second phase or the atomic broadcast. Thus, in most cases, once the atomic broadcast starts processing the next block, enough votes should have arrived to terminate the pre-protocol quite rapidly. This approach also has an interesting impact on the overall architecture – rather than having a simple API to call the pre-protocol, some part of it now needs to permanently run in the background. Alternatively, to save overhead, this could also be included as a piggyback in the gossiping protocol.

An additional approach to optimize the Neverending protocol is to allow requests to be removed from again. Recall that a request is added to if it has received votes and still blocks a request already in . This is necessary as we can no longer rely on getting more votes concerning this request, and to guarantee progress, this request now needs to be treated as if we know that it has to be in the same block as the one it blocks. However, as additional votes come in, it is possible that it unblocks again. In this case, and all requests that had been added to due to blocking can be removed from again, potentially releasing the block earlier.

For the timed protocol, a similar approach can be taken. For this protocol, we have the advantage that for each request , there is a finite number of requests that are blocking it. This blockage is released either once the corresponding request has timestamps smaller than the median timestamp on (in which case we know if any other request needs to be scheduled before , it also needs to be scheduled before ), or if it got timestamps of which at most are smaller than the median of (in which case it can and will be scheduled after ). To fully optimize latency, we also need to constantly verify if new incoming votes increase the median of a subset of votes for , as a higher median increases the possibility that another request can be decided before it obtained votes.

With this modification, we believe that the protocols have optimal latency within our modular architecture, i.e., it is not possible to hand a block over to the atomic broadcast protocol earlier. The (informal) argument for the block fairness protocol goes as follows (from the point of view of a leader):

  • Every request that got votes gets its own , i.e., a potential block containing and all other requests that have to be in the same block as .By our fairness condition, we cannot deliver any request that has seen less than votes, as it is possible that another request that is unknown at this point has votes that prioritize it over and thus has to be in the same block. Therefore, for every request that can be in the next block, the protocol maintains has a

  • At any point in time, is minimal; the only requests in are requests that either have to be in the same block as , or might have to according to the information available.

  • cannot be finalized while it contains a request that is blocked by another request with less than votes, as might still be blocked by a yet unseen request. Thus, the protocol finalizes at the earliest possible occasion.

A similar argument holds for the timed protocol; again, the protocol maintains a separate for all eligible processes, and decides about all other requests at the earliest opportunity – either once it is clear that they can to be in the same block, or once enough votes are seen to conclude they don’t need to.

If we further want to optimize latency, we could open up the modularity of our approach. Most voting based atomic broadcast protocols start with the leader(s) broadcasting the content of the next block (or a hash thereof). Due to the pre-protocol, we already know that parties have seen the content of the requests in that block. Optimizing the interplay between the fairness pre-protocol, the atomic broadcast, and the underlying gossip/multicast protocol is thus certainly promising, but out of the scope of this paper. It also is possible to integrate our protocol deeper with the blockchain implementation. With some modifications it could, for example, replace the first phase of the ABC protocol from Cachin, Kursawe, Petzold and Shoup [8]. As our goal is a modular approach though, we will not follow that path at this point.

4.5.2 The combined protocol

There is a set of transactions that are ready for the atomic broadcast layer to use. For the ease of presentation, we assume that the communication layer is aware of , and omits any voting messages associated with any transaction in . Furthermore, there is a queue with which the protocol communicates with the atomic broadcast. The atomic broadcast protocol takes the requests in from one or several leaders, adds a block to the blockchain, and then deletes the scheduled requests from the queues from all leaders.

This version of the protocol is defined as a permanent service that takes in requests, and outputs blocks for the atomic broadcast protocol.

Widget Hybrid-Wendy for protocol instance
All parties:

  • let be the counter of incoming requests, starting at 0.

  • while true do

    • for all first seen and unscheduled requests , in the order of the timestamps on the requests, send the message (,,, timestamp(),) to all parties, where is the sequence number of that request.

  • end while

Additional protocol for the leader(s):

  • while true do

    • once a request is contained in the signed and valid votes from parties, set to

    • while for any any request blocks a request and there is no of order

      • if request has at least votes, add to

      • if a request no longer blocks any other request in , remove from

    • end while

    • for all for which no request in is blocked by a request ,

      • add to the , validated by all signed votes for requests in .

      • add all to , and remove them from all sets

    • if there is a of order

      • set all

      • while all

        • for all requests contained in the signed list of validators, set

      • for all relating to a nonempty ,

        • let be the set of requests for which a vote with a smaller timestamp than was received

        • let be the largest median of any set of votes received for

        • once for all requests in valid votes or votes with timestamps smaller than are received,

          • for all , if timestamps of votes are smaller for than , add to

      • add to , validated by all signed votes for requests in .

      • add all to , and remove them from all

    • end if

  • end while

4.6 Fairness and Advanced Staking

While the protocol described above is relatively model-independent, it is described in the classical committee model, i.e., we have parties with one vote each, up to can suffer from byzantine corruptions. This model translates easily into a stake-based model, where voting power is related to the stake parties have. To allow our results to be applicable for more different staking models, we consider the hybrid-adversary-structure model [19]. In short, this model generalizes the model by replacing the thresholds by the corresponding properties that are required to perform the proof; for example, the threshold is replaced by sets of parties of which at least one is honest, while corresponds to the largest sets of parties we can afford to wait for without having to rely on potentially corrupt parties. This allows to not only model weighted votes, but also take into account properties, e.g., requiring more of the stake in more than of a set of defined geographic regions to be honest. In addition, the hybrid model allows a trade-off between crash- and byzantine corruptions, allowing a higher number of overall failures if some of them are crash-only (which is a more likely scenario in reality). In the proofs for our protocols, the two aforementioned properties are the only properties we need, and – while working out the details remains future work – we expect that the proofs can be generalized in a (relatively) straightforward way. Thus, any staking model that can be formulated within the hybrid adversary structures is compatible with the relative-fairness protocols.

Another model of interest is the choice of a random subset of validators, as done for example in Algorand [16]. While we do not expect this to cause a fundamental issue for our protocols, some care needs to be taken on the interfaces, as the subsets should be somewhat synchronised between the fairness pre-protocol and the atomic broadcast. This, too, will be a subject of future work.

5 Conclusion

We have shown that relative fairness is one of the many desirable properties that is impossible to achieve in a byzantine fault tolerant setting. We have mitigated this by providing slightly weaker definitions of what fair is. We have a presented several protocols to achieve relative order fairness with these definitions, as well as a hybrid version that can switch between two levels of fairness to avoid the impossibility result. Our protocols are (largely) blockchain agnostic, and can be added to any protocol that provides a known set of validators. Furthermore, our protocols have optimal resiliency in the asynchronous model (i.e., ) and optimal latency in terms of message passing rounds within our architectural model.

References

  • [1] I. Abraham, G. Gueta, and D. Malkhi (2018) Hot-stuff the linear, optimal-resilience, one-message BFT devil. CoRR abs/1803.05069. External Links: Link, 1803.05069 Cited by: §2.1, §3.
  • [2] Y. Amoussou-Guenou, A. D. Pozzo, M. Potop-Butucaru, and S. Tucci Piergiovanni (2019) Dissecting tendermint. See Networked systems - 7th international conference, NETYS 2019, marrakech, morocco, june 19-21, 2019, revised selected papers, Atig and Schwarzmann, pp. 166–182. External Links: Link, Document Cited by: §3.
  • [3] M. F. Atig and A. A. Schwarzmann (Eds.) (2019) Networked systems - 7th international conference, NETYS 2019, marrakech, morocco, june 19-21, 2019, revised selected papers. Lecture Notes in Computer Science, Vol. 11704, Springer. External Links: Link, Document, ISBN 978-3-030-31276-3 Cited by: 2.
  • [4] C. Badertscher, P. Gazzi, A. Kiayias, A. Russell, and V. Zikas (2019) Ouroboros chronos: permissionless clock synchronization via proof-of-stake. Note: Cryptology ePrint Archive, Report 2019/838https://eprint.iacr.org/2019/838 Cited by: §4.4.
  • [5] A. Bondavalli, F. Brancati, A. Ceccarelli, L. Falai, and M. Vadursi (2013)

    Resilient estimation of synchronisation uncertainty through software clocks

    .
    IJCCBS 4 (4), pp. 301–322. External Links: Link, Document Cited by: §4.4.
  • [6] E. Buchman, J. Kwon, and Z. Milosevic (2018) The latest gossip on BFT consensus. CoRR abs/1807.04938. External Links: Link, 1807.04938 Cited by: §2.1.
  • [7] V. Buterin and V. Griffith (2017) Casper the friendly finality gadget. CoRR abs/1710.09437. External Links: Link, 1710.09437 Cited by: §2.
  • [8] C. Cachin, K. Kursawe, F. Petzold, and V. Shoup (2001) Secure and efficient asynchronous broadcast protocols. See Advances in cryptology - CRYPTO 2001, 21st annual international cryptology conference, santa barbara, california, usa, august 19-23, 2001, proceedings, Kilian, pp. 524–541. External Links: Link, Document Cited by: §2.1, §2.1, §2.1, §2, §3, §3, §4.5.1, §4.
  • [9] C. Cachin, K. Kursawe, and V. Shoup (2005) Random oracles in constantinople: practical asynchronous byzantine agreement using cryptography. J. Cryptology 18 (3), pp. 219–246. External Links: Link, Document Cited by: §4.3.
  • [10] L. Caires, G. F. Italiano, L. Monteiro, C. Palamidessi, and M. Yung (Eds.) (2005) Automata, languages and programming, 32nd international colloquium, ICALP 2005, lisbon, portugal, july 11-15, 2005, proceedings. Lecture Notes in Computer Science, Vol. 3580, Springer. External Links: Link, Document, ISBN 3-540-27580-0 Cited by: 20.
  • [11] M. Castro and B. Liskov (1999) Practical byzantine fault tolerance. See Proceedings of the third USENIX symposium on operating systems design and implementation (osdi), new orleans, louisiana, usa, february 22-25, 1999, Seltzer and Leach, pp. 173–186. External Links: Link Cited by: §2.1, §3.
  • [12] P. Daian, S. Goldfeder, T. Kell, Y. Li, X. Zhao, I. Bentov, L. Breidenbach, and A. Juels (2019) Flash boys 2.0: frontrunning, transaction reordering, and consensus instability in decentralized exchanges. CoRR abs/1904.05234. External Links: Link, 1904.05234 Cited by: §1, §4.4.
  • [13] G. Danezis, D. Hrycyszyn, B. Mannering, T. Rudolph, and D. Šiška (2018) Vega protocol: a liquidity incentivising trading protocol for smart financial products.. External Links: Link Cited by: §1.
  • [14] S. Duan, M. K. Reiter, and H. Zhang (2017) Secure causal atomic broadcast, revisited. In 2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Vol. , pp. 61–72. Cited by: §2.1.
  • [15] C. Dwork, N. Lynch, and L. Stockmeyer (1988-04) Consensus in the presence of partial synchrony. J. ACM 35 (2), pp. 288–323. External Links: ISSN 0004-5411, Link, Document Cited by: §2, §3.
  • [16] Y. Gilad, R. Hemo, S. Micali, G. Vlachos, and N. Zeldovich (2017) Algorand: scaling byzantine agreements for cryptocurrencies. In Proceedings of the 26th Symposium on Operating Systems Principles, SOSP ’17, New York, NY, USA, pp. 51–68. External Links: ISBN 9781450350853, Link, Document Cited by: §4.6.
  • [17] M. Kelkar, F. Zhang, S. Goldfeder, and A. Juels (2020) Order-fairness for byzantine consensus. Note: Cryptology ePrint Archive, Report 2020/269https://eprint.iacr.org/2020/269 Cited by: §2.1, §3, §4.1, §4.2.
  • [18] J. Kilian (Ed.) (2001) Advances in cryptology - CRYPTO 2001, 21st annual international cryptology conference, santa barbara, california, usa, august 19-23, 2001, proceedings. Lecture Notes in Computer Science, Vol. 2139, Springer. External Links: Link, Document, ISBN 3-540-42456-3 Cited by: 8.
  • [19] K. Kursawe and F. C. Freiling (2005) Byzantine fault tolerance on general hybrid adversary structures. Technical report RWTH Aachen. Cited by: §2, §4.6.
  • [20] K. Kursawe and V. Shoup (2005) Optimistic asynchronous atomic broadcast. See Automata, languages and programming, 32nd international colloquium, ICALP 2005, lisbon, portugal, july 11-15, 2005, proceedings, Caires et al., pp. 204–215. External Links: Link, Document Cited by: §3.
  • [21] M. Kwiatkowska and G. Norman (2002) Verifying randomized byzantine agreement. In Proc. Formal Techniques for Networked and Distributed Systems (FORTE’02), volume 2529 of LNCS, pp. 194–209. Cited by: §1.
  • [22] K. Lev-Ari, A. Spiegelman, I. Keidar, and D. Malkhi (2019) FairLedger: A fair blockchain protocol for financial institutions. CoRR abs/1906.03819. External Links: Link, 1906.03819 Cited by: §2.1.
  • [23] A. Miller, Y. Xia, K. Croman, E. Shi, and D. Song (2016) The honey badger of bft protocols. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, CCS ’16, New York, NY, USA, pp. 31–42. External Links: ISBN 978-1-4503-4139-4, Link, Document Cited by: §2.1, §2.1, §3.
  • [24] M. K. Reiter and K. P. Birman (1994-05) How to securely replicate services. ACM Trans. Program. Lang. Syst. 16 (3), pp. 986–1009. External Links: ISSN 0164-0925, Link, Document Cited by: §2.1.
  • [25] M. I. Seltzer and P. J. Leach (Eds.) (1999) Proceedings of the third USENIX symposium on operating systems design and implementation (osdi), new orleans, louisiana, usa, february 22-25, 1999. USENIX Association. External Links: ISBN 1-880446-39-1 Cited by: 11.