Paxos is a family of protocols for solving consensus in a network of unreliable processors with unreliable communication. Consensus is the process of deciding on one result among a group of participants. Paxos protocols play an important role in our daily life. For example, Google uses the Paxos algorithm in their Chubby distributed lock service in order to keep replicas consistent in case of failure (Burrows, 2006). VMware uses a Paxos-based protocol within the NSX Controller. Amazon Web Services uses Paxos-like algorithms extensively to power its platform (Newcombe et al., 2015). The key safety property of Paxos is consistency: processors cannot decide on different values.
Due to its importance, verifying the safety of distributed protocols like Paxos is an ongoing research challenge. The systems and programming languages communities have had several recent success stories in verifying the safety of Paxos-like protocols in projects such as IronFleet (Hawblitzel et al., 2015), Verdi (Wilcox et al., 2015), and PSync (Dragoi et al., 2016)111IronFleet and PSync also verify certain liveness properties..
1.1. Main Results
This work aims to increase the level of automation in verification of distributed protocols, hoping that it will eventually lead to wider adoption of formal verification in this domain. We follow IronFleet, Verdi, and PSync, in requiring that the user supplies inductive invariants for the protocols. We aim to automate the process of checking the inductiveness of the user-supplied invariants. The goal is that the system can reliably produce in finite time either a proof that the invariant is inductive or display a comprehensible counterexample to induction (CTI), i.e., a concrete transition of the protocol from state to state such that satisfies the given invariant and does not222Such a CTI indicates that there is a bug in the protocol itself, or that the provided invariant is inadequate (e.g., too weak or too strong).. Such a task seems very difficult since these protocols are usually expressed in rich programming languages in which automatically checking inductive invariants is both undecidable and very hard in practice. In fact, in the IronFleet project, it was observed that undecidability of the reasoning performed by Z3 (de Moura and Bjørner, 2008) is a major hurdle in their verification process.
1.1.1. Criteria for Automatic Deductive Verification
We aim for an automated deductive verification technique that achieves three goals:
Making the invariants readable even for users who are not expert in the tools.
Making sure that if the invariant is inductive then the solver is guaranteed to prove it.
- Finite Counterexamples:
Guaranteeing that if the invariant is not inductive then the solver can display a concrete counterexample to induction with a finite number of nodes which can be diagnosed by users.
These goals are highly ambitious. Expressing the verification conditions in a decidable logic with a small model property (e.g., EPR (Piskac et al., 2010)) will guarantee Completeness and Finite Counterexamples. However, it is not clear how to model complex protocols like Paxos in such logics. Consensus protocols such as Paxos often require higher-order reasoning about sets of nodes (majority sets or quorums), combined with complex quantification. In fact, some researchers conjectured that decidable logics are too restrictive to be useful.
Furthermore, we are aiming to obtain natural invariants. We decided to verify the designs of the protocols and not their implementations since the invariants are more natural and since we wanted to avoid dealing with low level implementation issues. In the future we plan to use refinement to synthesize efficient low level implementations. Systems such as Alloy (Jackson, 2006) and TLA (Lamport, 2002) have already been used for finding bugs in protocols and inductive invariants (e.g., by Amazon (Newcombe et al., 2015)). Again they verify and identify faults in the designs and not the actual implementation. However, in contrast to our approach, they cannot automatically produce proofs for inductiveness (Completeness).
1.1.2. A Reusable Verification Methodology
In this work, we develop a novel reusable verification methodology based on Effectively Propositional logic (EPR) for achieving the above goals. Our methodology allows the expression of complex protocols and systems, while guaranteeing that the verification conditions are expressed in EPR. EPR provides both decidability and finite counterexamples, and is supported by existing solvers (e.g., Z3 (de Moura and Bjørner, 2008), iProver (Korovin, 2008), VAMPIRE (Riazanov and Voronkov, 2002), CVC4 (Barrett et al., 2011)). We have used our methodology to verify the safety of Paxos, and several of its variants, including Multi-Paxos, Vertical Paxos, Fast Paxos, Flexible Paxos and Stoppable Paxos. To the best of our knowledge, this work is the first to verify these protocols using a decidable logic, and, in the case of Vertical Paxos, Fast Paxos, and Stoppable Paxos, it is also the first mechanized safety proof.
We have also compared our methodology to a traditional approach based on a state-of-the-art interactive theorem prover—Isabelle/HOL (Nipkow et al., 2002). Our comparison shows that the inductive invariants used are very similar in both approaches (Natural), and that our methodology allows more reliable and predictable automation: an interactive theorem prover can discharge proof obligations to theorem provers using undecidable theories, but these often fail due to the undecidability. In such cases, it requires an experienced expert user to prove the inductive invariant. In contrast, with our methodology all the verification conditions are decidable and therefore checking them is fully automated.
First-order uninterpreted abstraction
The first phase in our verification process is expressing the system and invariant in (undecidable) many-sorted first-order logic over uninterpreted structures. This is in contrast to SMT which allows the use of interpreted theories such as arithmetic and the theory of arrays. The use of theories is natural specifically for handling low level aspects such as machine arithmetic and low level storage. However, SMT leads to inherent undecidability with quantifiers which are used to model unbounded systems. In contrast to SMT, we handle concepts, such as arithmetic and set cardinalities, using abstraction expressible in first-order logic, e.g., a totally ordered set instead of the natural numbers. This involves coming up with domain knowledge encoded as first-order axioms (e.g. a first-order formula expressing transitivity of a total order).
We are encouraged by the simplicity of our abstractions and the fact that they are precise enough to prove complex protocols. We also note that using first-order logic has led us to axioms and invariants that elegantly capture the essence of the protocols. This is also enabled by the fact that we are modeling high-level protocols and not their low level implementations.
At the end of this phase, the verification conditions are in general first-order logic. This is already useful as it allows to use resolution-based theorem provers (e.g., SPASS (Weidenbach et al., 2009) and VAMPIRE (Riazanov and Voronkov, 2002)). Yet, at this stage the verification conditions are still undecidable, and solvers are not guaranteed to terminate.
One way to obtain decidability is to restrict quantifier alternations. We examine the quantifier alternation graph of the verification condition, which connects sorts that alternate in quantification. When this graph contains cycles, solvers such as Z3 often diverge into infinite loops while instantiating quantifiers. This issue is avoided when the graph is acyclic, in which case the verification condition is essentially in EPR. Therefore, the second phase of our methodology provides a systematic way to soundly eliminate the cycles.
Eliminating quantifier alternations using derived relations
The most creative part in our methodology is adding derived relations and rewriting the code to break the cycles in the quantifier alternation graph. The main idea is to capture an existential formula by a derived relation, and then to use the derived relation as a substitute for the formula, both in the code and in the invariant, thus eliminating some quantifier alternations. The user is responsible for defining the derived relations and performing the rewrites. The system automatically generates update code for the derived relations, and automatically checks the soundness of the rewrites. For the generation of update code, we exploit the locality of updates, as relations (used for defining the derived relations) are updated by inserting a single entry at a time. We identify a class of formulas for which this automatic procedure is always possible and use this class for verifying the Paxos protocols.
We are encouraged by the fact that the transformations needed in this step are reusable across all Paxos variants we consider. Furthermore, the transformations maintain the simplicity and readability of both the code and the inductive invariants.
1.2. Summary of the rest of the paper
In Section 2 we present the technical background on using first-order logic to express transition systems, and on the EPR fragment. We then develop our general methodology for EPR-based verification in Section 3. Section 4 reviews the Paxos consensus algorithm, which is the basis for all Paxos-like protocols. We present our model of the Paxos consensus algorithm as a transition system in first-order logic in Section 5, and continue to verify it using EPR by applying our methodology in Section 6. In Section 7, we describe our verification of Multi-Paxos using EPR. We briefly discuss the verification of Vertical Paxos, Fast Paxos, Flexible Paxos, and Stoppable Paxos in Section 8. In Section 9 we report on our implementation and experimental evaluation. We discuss related work in Section 10, and Section 11 concludes the paper. More details about the verification of Vertical Paxos, Fast Paxos, Flexible Paxos, and Stoppable Paxos appear in Appendix A. Appendix B contains a worked out comparison of the proof of Paxos using our methodology to a proof using the Isabelle/HOL interactive proof assistant.
2. Background: Verification using EPR
In this section we present the necessary background on the formalization of transition systems using first-order logic, as well as on the EPR fragment of first-order logic.
2.1. Transition Systems in First Order Logic
We model transition systems using many-sorted first-order logic. We use a vocabulary which consists of sorted constant symbols, function symbols and relation symbols to capture the state of the system, and formulas to capture sets of states and transitions. Formally, given a vocabulary , a state is a first-order structure over . We sometimes use axioms in the form of closed first-order formulas over , to restrict the set of states to those that satisfy all the axioms. A transition system is a pair , where INIT is the initial condition given by a closed formula over , and TR is the transition relation given by a closed formula over where is used to describe the source state of the transition and is used to describe the target state. The set of initial states and the set of transitions of the system consist of the states, respectively, pairs of states, that satisfy INIT, respectively, TR. We define the set of reachable states of a transition system in the usual way. A safety property is expressed by a closed formula over . The system is safe if all of its reachable states satisfy .
In the paper, we use the relational modeling language (RML) (Padon et al., 2016) to express transition systems. An RML program consists of actions, each of which consists of a loop-free code that is executed atomically, and corresponds to a single transition. RML commands include non-deterministic choice, sequential composition, and updates to constant symbols, function symbols and relation symbols (representing the system’s state), where updates are expressed by first-order formulas. In addition, conditions in RML are expressed using assume commands. RML programs naturally translate to formulas , where TR is a disjunction of the transition relation formulas associated with each action (see (Padon et al., 2016) for details of the translation). As such, we will use models, programs and transition systems interchangeably throughout the paper. We note that RML is Turing-complete, and remains so when INIT and TR are restricted to the EPR fragment.
A closed first-order formula INV over is an inductive invariant for a transition system if and , where results from substituting every symbol in INV by its primed version. These requirements ensure that an inductive invariant represents a superset of the reachable states. Given a safety property , an inductive invariant INV proves that the transition system is safe if . Equivalently, INV proves safety of for if the following formulas are unsatisfiable: (i) , (ii) , and (iii) . We refer to these formulas as the verification condition of INV. When is satisfiable, and , we say that the transition is a counterexample to induction (CTI).
2.2. Extended Effectively Propositional Logic (EPR)
The effectively-propositional (EPR) fragment of first-order logic, also known as the Bernays-Schönfinkel-Ramsey class is restricted to relational first-order formulas (i.e., formulas over a vocabulary that contains constant symbols and relation symbols but no function symbols) with a quantifier prefix in prenex normal form. Satisfiability of EPR formulas is decidable (Lewis, 1980). Moreover, formulas in this fragment enjoy the finite model property, meaning that a satisfiable formula is guaranteed to have a finite model. The size of this model is bounded by the total number of existential quantifiers and constants in the formula. The reason for this is that given an -formula, we can obtain an equi-satisfiable quantifier-free formula by Skolemization, i.e., replacing the existentially quantified variables by constants, and then instantiating the universal quantifiers for all constants. While EPR does not allow any function symbols nor quantifier alternation except , it can be easily extended to allow stratified function symbols and quantifier alternation (as formalized below). The extension maintains both the finite model property and the decidability of the satisfiability problem.
The quantifier alternation graph
Let be a formula in negation normal form over a many-sorted signature with a set of sorts . We define the quantifier alternation graph of as a directed graph where the set of vertices is the set of sorts, , and the set of directed edges, called edges, is defined as follows.
Function edges: let be a function in from sorts to sort . Then there is a edge from to for every .
Quantifier edges: let be an existential quantifier that resides in the scope of the universal quantifiers in . Then there is a edge from to for every .
Intuitively, the quantifier edges correspond to the edges that would arise as function edges if Skolemization is applied.
A formula is stratified if its quantifier alternation graph is acyclic. The extended EPR fragment consists of all stratified formulas. This fragment maintains the finite model property and the decidability of EPR. The reason for this is that, after Skolemization, the vocabulary of a stratified formula can only generate a finite set of ground terms. This allows complete instantiation of the universal quantifiers in the Skolemized formula, as in EPR. In the sequel, whenever we say a formula is in EPR, we refer to the extended EPR fragment.
3. Methodology for Decidable Verification
In this section we explain the general methodology that we follow in our efforts to verify Paxos using decidable reasoning. While this paper focuses on Paxos and its variant, the methodology is more general and can be useful for verifying other systems as well.
3.1. Modeling in Uninterpreted First-Order Logic
The first step in our verification methodology is to express the protocol as a transition system in many-sorted uninterpreted first-order logic. This step involves some abstraction, since protocols usually employ concepts that are not directly expressible in uninterpreted first-order logic.
3.1.1. Axiomatizing Interpreted Domains
One of the challenges we face is modeling an interpreted domain using uninterpreted first-order logic. Distributed algorithms often use values from interpreted domains, the most common example being the natural numbers. These domains are usually not precisely expressible in uninterpreted first order logic.
To express an interpreted domain, such as the natural numbers, in uninterpreted first-order logic, we add a sort that represents elements of the interpreted domain, and uninterpreted symbols to represent the interpreted symbols (e.g. a binary relation). We capture part of the intended interpretation of the symbols by introducing axioms to the model. The axioms are a finite set of first-order logic formulas that are valid in the interpreted domain. By adding them to the model, we allow the proof of verification conditions to rely on these axioms. By using only axioms that are valid in the interpreted domain, we guarantee that any invariant proved for the first-order model is also valid for the actual system.
One important example for axioms expressible in first-order logic is the axiomatization of total orders. In many cases, natural numbers are used as a way to enforce a total order on a set of elements. In such cases, we can add a binary relation , along with the axioms listed in Fig. 1, which precisely capture the properties of a total order.
3.1.2. Expressing Higher-Order Logic
Another hurdle to using first-order logic is the fact that algorithms and their invariants often use sets and functions as first class values, e.g. by quantifying over them, sending them in a message, etc. Consider an algorithm in which messages contain a set of nodes as one of the message fields. Then, the set of messages sent so far (which may be part of the state of the system) is a set of tuples, where one of the elements in the tuples is itself a set of nodes. Similarly, messages may contain maps, which are naturally modeled by functions (e.g., a message may contain a map from nodes to values). In such cases, the invariants needed to prove the algorithms will usually include higher-order quantification.
While higher-order logic cannot be fully reduced to first-order logic, it is well-known that we can partly express high-order concepts in first-order logic in the following way.
Suppose we want to express quantification over sets of nodes. We add a new sort called nodeset, and a binary relation . We then use instead of , and express quantification over sets of nodes as quantification over nodeset. Typically, we will need to add first-order assumptions or axioms to correctly express the algorithm and to prove its inductive invariant. For example, the algorithm may set to the empty set as part of a transition. We can translate this in the transition relation using (where is the value of after the transition).
Functions can be encoded as first-order elements in a similar way. Suppose messages in the algorithm contain a map from nodes to values. In this case, we can add a new first-order sort called map, and a function symbol . Then, we can use instead of , and replace quantification over functions with quantification of the first-order sort map. As before, we may need to add axioms that capture some of the intended second-order meaning of the sort map.
While this encoding is sound (as long as we only use axioms that are valid in the higher-order interpretation), it cannot be made complete due to the limitation of first-order logic. However, we did not experience this incompleteness to be a practical hurdle for verification in first-order logic.
3.1.3. Semi-Bounded Verification
Given a transition system in first-order logic with a candidate inductive invariant, it may still be undecidable to check the resulting verification condition. However, bounded verification is decidable, and extremely useful for debugging the model before continuing with the efforts of unbounded verification. Contrary to the usual practice of bounding the number of elements in each sort for bounded verification, we use the quantifier alternation graph to determine only a subset of the sorts to bound in order to make verification decidable. We call this procedure semi-bounded verification, and it follows from the observation that whenever we make a sort bounded, we can remove its node from the quantifier alternation graph. When the resulting graph becomes acyclic, satisfiability is decidable without bounding the sizes of the remaining sorts.
3.2. Transformation to EPR Using Derived Relations
The second step in our methodology for decidable unbounded verification is to transform the model expressed in first-order logic to a model that has an inductive invariant whose verification condition is in EPR, and is therefore decidable to check. The methodology is manual, but following it ensures soundness of the verification process. The key idea is to use derived relations to simplify the transition relation and the inductive invariant. Derived relations extend the state of the system and are updated in its transitions. Derived relations are somewhat analogous to ghost variables. However there are two key differences. First, derived relations are typically not used to record the history of an execution. Instead, they capture properties of the current state in a way that facilitates verification using EPR. Second, derived relations are not only updated in the transitions, but can also affect them.
The transformation of the model using derived relations is conducted in steps, as detailed below. The various steps are depicted in Fig. 2. The inputs provided by the user are depicted by rectangles, while the automated procedures are depicted as hexagons, and their outputs are depicted as ellipses. As illustrated by the figure, the user is guided by the quantifier alternation graph of the verification conditions.
In the sequel, we fix a model over a vocabulary and let INIT and TR denote its initial condition and transition relation, respectively.
(1) Defining a derived relation
In the first, and most creative part of the process, the user identifies an existentially quantified formula that will be captured by a derived relation . The selection of is guided by the quantifier alternation graph of the verification condition, with the purpose of eliminating cycles it contains. Quantifier alternations in the verification condition originate both from the model and the inductive invariant. As we shall see, using will allow us to eliminate some quantifier alternations. As an example for demonstrating the next steps, consider a program defined with a binary relation , and let be a derived relation capturing the formula .
(2) Tracking by
This step automatically extends the model into a model over vocabulary which makes the same transitions as before, but also updates to capture . Formally, the transformed model over is obtained by adding: (i) an initial condition that initializes , and (ii) update codethat modifies whenever the relations mentioned in are modified. The initial condition and update code are automatically generated in a way that guarantees that the following formula is an invariant of :
We call this invariant the representation invariant of . Our scheme for automatically obtaining and the class of formulas that it supports, are discussed in Section 3.3. In our example, suppose that initially is empty. Then, the resulting model would initialize to be empty as well. For an action that inserts a pair to , the resulting model would contain update code that inserts to .
(3) Rewriting the transitions using
In this step, the user exploits to eliminate quantifier alternations in the verification condition by rewriting the system’s transitions, obtaining a model defined over . The idea is to rewrite the transitions in a way that ensures that the reachable states are unchanged, while eliminating quantifier alternations. This is done by rewriting some program conditions used in assume commands in the code (e.g., to use instead of , but other rewrites are also possible). The vocabulary of the model does not change further in this step, nor does the initial condition (i.e., ).
While the rewrites are performed by the user, we automatically check that the effect of the modified commands on the reachable states remains the same (under the assumption of the representation invariant). Suppose the user rewrites assume to assume . The simplest way to ensure this has the same effect on the reachable states is to check that the following rewrite condition is valid: . This condition guarantees that the two formulas and are equivalent in any reachable state, due to the representation invariant. In some cases, the rewrite is such that is syntactically identical to , which makes the rewrite condition trivial.
However, to allow greater flexibility in rewriting the code, we allow using an EPR check to verify the rewrite condition, and also relax the condition given above in two ways. First, we observe that it suffices to verify the equivalence of subformulas of that were modified by the rewrite. Formally, if is syntactically identical to , then to establish the rewrite condition, it suffices to prove that for every the following equivalence is valid: . (The case where was completely modified is captured by the case where , and .) Second, and more importantly, recall that we are only interested in preserving the transitions from reachable states of the system. Thus, we allow the user to provide an auxiliary invariant (by default ) which is used to prove that the reachable transitions remain unchanged after the transformation. Technically, this is done by automatically checking that
is an inductive invariant of , and
the following rewrite condition holds for every :
where captures additional conditions that guard the modified assume command ( is automatically computed from the program).
These conditions guarantee that the two formulas and are equivalent whenever the modified assume command is executed. To ensure that these checks can be done automatically, we require that the corresponding formulas are in EPR. We note that verifying for can be possible in EPR even in cases where verifying safety of is not in EPR, since can be weaker (and contain less quantifier alternations) than an invariant that proves safety.
In our example, suppose the program contains the command assume . Then we could rewrite it to assume . For a more sophisticated example, suppose that the program contains the command assume , and suppose this command is guarded by the condition (i.e., the assume only happens if holds). Suppose further that we can verify that is an invariant of the original system. Then we could rewrite the assume command as assume since .
(4) Providing an inductive invariant
Finally, the user proves the safety of the transformed model by providing an inductive invariant for it, whose verification condition will be in EPR. Usually this is composed of: (i) Using in the inductive invariant as a substitute to using . The point here is that using would introduce quantifier alternations, and using instead avoids them. In our example, the safety proof might require the property that , and using we can express this as . (ii) Letting the inductive invariant express some properties that are implied by the representation invariant. Note that expressing the full representation invariant would typically introduce quantifier alternations that break stratification. However, some properties implied by it may still be expressible while keeping the verification condition in EPR. In our example, we may add to the inductive invariant. Note that adding to the inductive invariant would make the verification condition outside of EPR.
Given and , we can now automatically derive the verification conditions in EPR and check that they hold. The following theorem summarizes the soundness of the approach:
Theorem 3.1 (Soundness).
Let be a model over vocabulary , and be a safety property over . If is a model obtained by the above procedure, and is an inductive invariant for it such that , then holds in all reachable states of .
Let , where and denote the reachable states of and respectively, and denotes the projection of a state (defined over ) to . Steps 2 and 3 of the transformation above ensure that is a bisimulation relation between and , i.e., every transition possible in the reachable states of one of these systems has a corresponding transition in the other. This ensures that has the same reachable states as , up to the addition of relation . Therefore, any safety property expressed over which is verified to hold in also holds in . ∎
As shown in the proof of Theorem 3.1, the transformed model is bisimilar to the original model. While this ensures that both are equivalent w.r.t. to the safety property, note that we check safety by checking inductiveness of a candidate invariant. Unlike safety, inductiveness is not necessarily preserved by the transformation. Namely, given a candidate inductive invariant which is not inductive for , the counterexample to induction cannot in general be transformed to the original model, as it might depend on the derived relations and the rewritten assume commands. An example of this phenomenon appears in Section 6.2.
Using the methodology
Our description above explains a final successful verification using the proposed methodology. As always, obtaining this involves a series of attempts, where in each attempt the user provides the verification inputs, and gets a counterexample. Each counterexample guides the user to modify the verification inputs, until eventually verification is achieved. As depicted in Fig. 2, with the EPR verification methodology, the user provides 5 inputs, and could obtain 3 kinds of counterexamples. The inputs are the model, the derived relations, the rewrites, the auxiliary invariant for proving the soundness of the rewrites, and finally the inductive invariant for the resulting model. The possible counterexamples are either a counterexample to inductiveness (CTI) for the auxiliary invariant and the original model, or a counterexample to the soundness of the rewrite itself, or a CTI for the inductive invariant of the transformed model. After obtaining any of the 3 kinds of counterexamples, the user can modify any one of the 5 inputs. For example, a CTI for the inductive invariant of the transformed model may be eliminated by changing the inductive invariant itself, but it may also be overcome by an additional rewrite, which in turn requires an auxiliary invariant for its soundness proof. Indeed, we shall see an example of this in Section 6.2.
The task of managing the inter-dependence between the 5 verification inputs may seem daunting, and indeed it requires some expertise and creativity from the user. This is expected, since the inputs from the user reduce the undecidable problem of safety verification to decidable EPR checks. This burden on the user is eased by the fact that for every input, the user always obtains an answer from the system, either in the form of successful verification, or in the form of a finite counterexample, which is displayed graphically and guides the user towards the solution. Furthermore, our experience shows that most of the creative effort is reusable across similar protocols. In the verification of all the variants of Paxos we consider in this work, we use the same two derived relations and very similar rewrites (as explained in Sections 8 and 6).
Incompleteness of EPR verification
While the transformation using a given set of derived relations and rewrites results in a bisimilar transition system, the methodology for EPR verification is not complete. This is expected, as there can be no complete proof system for safety in a formalism that is Turing-complete. For the EPR verification methodology, the incompleteness can arise from several sources. It may happen that after applying the transformation, the resulting transition system, while safe, cannot be verified with an inductive invariant that results in EPR verification conditions. Another potential source for incompleteness is our requirement that the rewrites should also be verified in EPR. It can be the case that a certain (sound) rewrite leads to a system that can be verified using EPR, but the soundness of the rewrite itself cannot be verified using EPR. Another potential source of incompleteness can be the inability to express sufficiently powerful axioms about the underlying domain. We note that the three mentioned issues interact with each other, as it may be the case that a certain axiom is expressible in first-order logic, but it happens to introduce a quantifier alternation cycle, when considered together with either the inductive invariant or the verification conditions for the rewrites.
We consider developing a proof-theoretic understanding of which systems can and cannot be verified using EPR to be an intriguing direction for future investigation. We are encouraged by the fact that in practice, the proposed methodology has proven itself to be powerful enough to verify Paxos and its variants considered in this work.
Multiple derived relations
For simplicity, the description above considered a single derived relation. In practice, we usually add multiple derived relations, where each one captures a different formula. The methodology remains the same, and each derived relation allows us to transform the model and eliminate more quantifier alternations, until the resulting model can be verified in EPR. In this case, the resulting inductive invariant may include properties implied by the representation invariants of several relations and relate them directly. For example, suppose we add the following derived relations: defined by , and defined by . Then, the inductive invariant may include the property: .
Overapproximating the reachable states
Our methodology ensures that the transformed model is bisimilar to the original model. It is possible to generalize our methodology and only require that the modified model simulates the original model, which maintains soundness. This may allow more flexibility both in the update code and in the manual rewrites performed by the user.
3.3. Automatic Generation of Update Code
In this subsection, we describe a rather naïve scheme for automatic generation of initial condition and update code for derived relations, which suffices for verification of the Paxos variants considered in this paper. We refer the reader to, e.g., (Paige and Koenig, 1982; Reps et al., 2010), for more advanced techniques for generation of update code for derived relations.
We limit the formula which defines a derived relation to have the following form:
where is a quantifier-free formula, is a relation symbol and for every . Note that occurs positively, and that it depends on some (possibly none) of the variables and all of the variables . Our scheme further requires that the relations appearing in are never modified, and that is initially empty and only updated by inserting a single tuple at a time333These restrictions can be relaxed, e.g., to support removal of a single tuple or addition of multiple tuples. However, such updates were not needed for verification of the protocols considered in this paper, so for simplicity of the presentation we do not handle them..
Since is initially empty, the initial condition for is that it is empty as well, i.e.:
The only updates allowed for are insertions of a single tuple by a command of the form:
For such an update, we generate the following update for :
Notice that the update code translates to a purely universally quantified formula, since is quantifier-free, so it does not introduce any quantifier alternations.
Lemma 3.2 ().
The above scheme results in a model which maintains the representation invariant: .
The representation invariant is an inductive invariant of the resulting model. Initiation is trivial, since both and are initially empty. Consecution follows from the following, which is valid in first-order logic: . ∎
4. Introduction to Paxos
A popular approach for implementing distributed systems is state-machine replication (SMR) (Schneider, 1990), where a (virtual) centralized sequential state machine is replicated across many nodes (processors), providing fault-tolerance and exposing to its clients the familiar semantics of a centralized state machine. SMR can be thought of as repeatedly agreeing on a command to be executed next by the state machine, where each time agreement is obtained by solving a consensus problem. In the consensus problem, a set of nodes each propose a value and then reach agreement on a single proposal.
The Paxos family of protocols is widely used in practice for implementing SMR. Its core is the Paxos consensus algorithm (Lamport, 1998, 2001). A Paxos-based SMR implementation executes a sequence of Paxos consensus instances, with various optimizations. The rest of this section explains the Paxos consensus algorithm (whose verification in EPR we discuss in Sections 6 and 5). We return to the broader context of SMR in Section 8.
We consider a fixed set of nodes, which operate asynchronously and communicate by message passing, where every node can send a message to every node. Messages can be lost, duplicated, and reordered, but they are never corrupted. Nodes can fail by stopping, but otherwise faithfully execute their algorithm. A stop failure of a node can be captured by a loss of all messages to and from this node. Nodes must solve the consensus problem: each node has a value to propose and all nodes must eventually decide on a unique value among the proposals.
Paxos consensus algorithm
We assume that nodes in the Paxos consensus algorithm can all propose values, vote for values, and learn about decisions. The algorithm operates in a sequence of numbered rounds in which nodes can participate. At any given time, different nodes may operate in different rounds, and a node stops participating in a round once it started participating in a higher round. Each round is associated with a single node that is the owner of that round. This association from rounds to nodes is static and known to all nodes.
Every round represents a possibility for its owner to propose a value to the other nodes and get it decided on by having a quorum of the nodes vote for it in the round. Quorums are sets of nodes such that any two quorums intersect (e.g., sets consisting of a strict majority of the nodes). To avoid the algorithm being blocked by the stop failure of a node which made a proposal, any node can start one of its rounds and make a new proposal in it at any time (in particular, when other rounds are still active) by executing the following two phases:
- phase 1.:
The owner of round starts the round by communicating with the other nodes to have a majority of them join round , and to determine which values are choosable in lower rounds than , i.e., values that might have or can still be decided in rounds lower than .
- phase 2.:
If a value is choosable in , in order not to contradict a potential decision in , node proposes in round . If no value is choosable in any , then proposes a value of its choice in round . If a majority of nodes vote in round for ’s proposal, then it becomes decided.
Note that it is possible for different values to be proposed in different rounds, and also for several decisions to be made in different rounds. Safety is guaranteed by the fact that (by definition of choosable) a value can be decided in a round only if it is choosable in , and that if a value is choosable in round , then a node proposing in will only propose . The latter relies on the property that choosable values from prior rounds cannot be missed. Next, we describe in more detail what messages the nodes exchange and how a node makes sure not to miss any choosable value from prior rounds.
Phase 1a: The owner of round sends a “start-round” message, requesting all nodes to join round .
Phase 1b: Upon receiving a request to join round , a node will only join if it has not yet joined a higher round. If it agrees to join, it will respond with a “join-acknowledgment” message that will also contain its maximal vote so far, i.e., its vote in the highest round prior to , or if no such vote exists. By sending the join-acknowledgment message, the node promises that it will not join or vote in any round smaller than .
Phase 2a: After receives join-acknowledgment messages from a quorum of the nodes, it proposes a value for round by sending a “propose” message to all nodes. Node selects the value by taking the maximal vote reported by the nodes in their join-acknowledgment messages, i.e., the value that was voted for in the highest round prior to by any of the nodes whose join-acknowledgment messages formed the quorum. As we will see, only this value can be choosable in any out of all proposals from lower rounds. If all of these nodes report they have not voted in any prior round, then may propose any value.
Phase 2b: Upon receiving a propose message proposing value for round , a node will ignore it if it already joined a round higher than , and otherwise it will vote for it, by sending a vote message to all nodes. Whenever a quorum of nodes vote for a value in some round, this value is considered to be decided. Nodes learn this by observing the vote messages.
Note that a node can successfully start a new round or get a value decided only if at least one quorum of nodes is responsive. When quorums are taken to be sets consisting of a strict majority of the nodes, this means Paxos tolerates the failure of at most nodes, where is the total number of nodes. Moreover, Paxos may be caught in a live-lock if nodes keep starting new rounds before any value has a chance to be decided on.
5. Paxos in First-Order Logic
The first step of our verification methodology is to model the Paxos consensus algorithm as a transition system in many-sorted first-order logic over uninterpreted domains. This section explains our model, listed in Fig. 3, as well as its safety proof via an inductive invariant.
5.1. Model of the Protocol
Our model of Paxos involves some abstraction. Since each round has a unique owner that will exclusively propose in , we abstract away the owner node and treat the round itself as the proposer. We also abstract the mechanism by which nodes receive the values up for proposal, and allow them to propose arbitrary values.
Additional abstractions are needed as some aspects of the protocol cannot be fully expressed in uninterpreted first-order logic. One such aspect is the fact that round numbers are integers, as arithmetic cannot be fully captured in first-order logic. Another aspect which must be abstracted is the use of sets of nodes as quantification over sets is also beyond first-order logic. We model these aspects according to the principles of Section 3.1:
Sorts and Axioms
We use the following four uninterpreted sorts: (i) node - to represent nodes of the system, (ii) value - to represent the values subject to the consensus algorithm, (iii) round - to model the rounds of Paxos, and (iv) quorum - to model sets of nodes with pairwise intersection in a first-order abstraction. While nodes and values are naturally uninterpreted, the rounds and the quorums are uninterpreted representations of interpreted concepts: integers and sets of nodes that intersect pairwise, respectively. We express some features that come from the desired interpretation using relations and axioms.
For rounds, we include a binary relation , and axiomatize it to be a total order (Fig. 1). Our model also includes a constant of sort round, which represents a special round that is not considered an actual round of the protocol, and instead serves as a special value used in the join-acknowledgment (1b) message when a node has not yet voted for any value. Accordingly, any action assumes that the round it involves is not .
The quorum sort is used to represent sets of nodes that contain strictly more than half of the nodes. As explained in Section 3.1, we introduce a membership relation between nodes and quorums. An important property for Paxos is that any two quorums intersect. We capture this with an axiom in first-order logic (Fig. 3 section 5).
The state of the protocol consists of the set of messages the nodes have sent. We represent these using relations, where each tuple in a relation corresponds to a single message. The relations start_round_msg, join_ack_msg, propose_msg, vote_msg correspond to the 1a, 1b, 2a, 2b phases of the algorithm, respectively. In modeling the algorithm, we assume all messages are sent to all nodes, so the relations do not contain destination fields. Note that recording messages via relations (i.e., sets) is an abstraction of the network but it is consistent with the messaging model we assume, in which messages may be lost, duplicated, and reordered. The decision relation captures the decisions learned by the nodes.
The different atomic steps taken by the nodes in the protocol are modeled using actions. The start_round action models phase 1a of the protocol, sending a start round message to all nodes. The join_round action models the receipt of a start round message and the transmission of a join-acknowledgment (1b) message. The propose action models the receipt of join-acknowledgment (1b) messages from a quorum of nodes, and the transmission of a propose (2a) message which proposes a value for a round. The vote action models the receipt of a propose (2a) message by a node, and voting for a value by sending a vote (2b) message. Finally, the learn action models learning a decision by node , when it is voted for by a quorum of nodes.
In these actions, sending a message is expressed by inserting the corresponding tuple to the corresponding relation. Different conditions (e.g., not joining a round if already joined higher round, properly reporting the previous votes, or appropriately selecting the proposed value) are expressed using assume statements. To prepare a join-acknowledgment message in join_round, as well as to propose a value in propose, a node needs to compute the maximal vote (performed by it or reported to it, respectively). This is done by a max operation (section 5 and section 5) which operates with respect to the order on rounds, and returns the round and an arbitrary value in case the set is empty. The operation is syntactic sugar for an assume of the following formula:
Note that if is a purely existentially quantified formula, then eq. 3 is alternation-free.
5.2. Inductive Invariant
The key safety property we wish to verify about Paxos is that only a single value can be decided (it can be decided at multiple rounds, as long as it is the same value). This is expressed by the following universally quantified formula:
While the safety property holds in all the reachable states of the protocol, it is not inductive. That is, assuming that it holds is not sufficient to prove that it still holds after an action is taken. For example, consider a state in which holds and there is a quorum of nodes such that, for every node in , holds, with . Note that the safety property holds in . However, a learn action introduces a transition from state to a state in which both and hold, violating the safety property. This counterexample to induction does not indicate a violation of safety, but it indicates that the safety property needs to be strengthened in order to obtain an inductive invariant. We now describe such an inductive invariant.
Our inductive invariant contains, in addition to the safety property, the following rather simple statements that are maintained by the protocol and are required for inductiveness:
Equation 5 states that there is a unique proposal per round. Equation 6 states that a vote for in round is cast only when a proposal for has been made in round . Equation 7 states that a decision for is made in round only if a quorum of nodes have voted for in round . In addition, the inductive invariant restricts the join-acknowledgment messages so that they faithfully represent the maximal vote (up to the joined round), or if there are no votes so far, and also asserts that there are no actual votes at round :
The properties stated so far are rather straightforward, and are usually not even mentioned in paper proofs or explanations of the protocol. The key to the correctness argument of the protocol is the observation that when the owner of round proposes a value in , it cannot miss any value that is choosable at a lower round: whenever a value is proposed at round , then in all rounds prior to , no other value is choosable. The property that no is choosable at is captured in the inductive invariant by the requirement that in any quorum of nodes, there must be at least one node that has already left round (i.e., joined a higher round), and did not vote for at (and hence will also not vote for it in the future). Formally, this is:
The fact that this property is maintained by the protocol is obtained by the proposal mechanism and the interaction between phase 1 and phase 2 (see Appendix B for a detailed explanation).
Equations 12, 11, 10, 9, 8, 7, 6, 5 and 4 define an inductive invariant that proves the safety of the Paxos model of Fig. 3. However, the verification condition for this inductive invariant contains cyclic quantifier alternations, and is therefore outside of EPR. We now review the quantifier alternations in the verification condition, which originate both from the model and from the inductive invariant.
In the model, the axiomatization of quorums (Fig. 3 section 5) introduces a -edge from quorum to node. In addition, the assumption in the propose action that join-acknowledgment messages were received from a quorum of nodes (section 5) introduces -edges from node to round and from node to value.
In the inductive invariant, only eqs. 12 and 7 include quantifier alternations (the rest are universally quantified). Equation 7 has quantifier structure 444The local existential quantifier in does not affect the quantifier alternation graph.. Note that the inductive invariant appears both positively and negatively in the verification condition, so eq. 7 adds -edges from round to quorum and from value to quorum (from the positive occurrence), as well as an edge from quorum to node (from the negative occurrence). While the latter coincides with the edge that comes from the quorum axiomatization (section 5), the former edges closes a cycle in the quantifier alternation graph. Equation 12 has quantifier prefix . Thus, it introduces 9 edges in the quantifier alternation graph, including self-loops at round and value. In conclusion, while the presented model in first-order logic has an inductive invariant in first-order logic, the resulting verification condition is outside of EPR.
6. Paxos in EPR
The quantifier alternation graph of the model of Paxos described in Section 5 contains cycles. To obtain a safety proof of Paxos in EPR, we apply the methodology described in Section 3 to transform this model in a way that eliminates the cycles from the quantifier alternation graph. The resulting changes to the model are presented in Fig. 5, and the rest of this section explains them step by step.
6.1. Derived Relation for Left Rounds
We start by addressing the quantifier alternation that appears in eq. 12 as part of the inductive invariant. We observe that the following existentially quantified formula appears both as a subformula there, and in the conditions of the join_round and the vote actions (Fig. 3 sections 5 and 5):
This formula captures the fact that node has joined a round higher than , which means it promises to never participate in round in any way, i.e., it will neither join nor vote in round . We add a derived relation called left_round to capture , so that captures the fact that node has left round . The formula is in the class of formulas handled by the scheme described in Section 3.3, and thus we obtain the initial condition and update code for left_round. The result appears in Fig. 5 sections 5 and 5.
Rewriting (steps 3+4)
Using the left_round relation, we rewrite the conditions of the join_round and vote actions (Fig. 5 sections 5 and 5). These rewrites are trivially sound as explained in Section 3.2 (with a trivial rewrite condition). We also rewrite eq. 12 as:
6.2. Derived Relation for Joined Rounds
While each of these introduces quantifier alternations that are stratified when viewed separately, together they form cycles. Equation 13 introduces -edges from round and value to node, while the propose condition introduces edges from node to round and value. The propose condition expresses the fact that every node in the quorum q has joined round r by sending a join-acknowledgment (1b) message to round r. However, because the join-acknowledgment message contains two more fields (representing the node’s maximal vote so far), the condition existentially quantifies over them. To eliminate this existential quantification and remove the cycles, we add a derived relation, called joined_round, that captures the projection of join_ack_msg over its first two components, given by the formula:
This binary relation over nodes and rounds records the sending of join-acknowledgment messages, ignoring the maximal vote so far reported in the message. Thus, captures the fact that node has agreed to join round . The formula is in the class of formulas handled by the scheme of Section 3.3, and thus we obtain the initial condition and update code for joined_round, as it appears in Fig. 5 sections 5 and 5.
Rewriting (steps 3+4): first attempt
We rewrite the condition of the propose action to use joined_round instead of . (The rewrite condition is again trivial, ensuring soundness.) The result appears in Fig. 5 section 5, and is purely universally quantified. When considering the transformed model, and the candidate inductive invariant given by the conjunction of eqs. 13, 11, 10, 9, 8, 7, 6, 5 and 4, the resulting quantifier alternation graph is acyclic. This means that the verification condition is in EPR and hence decidable to check. However, it turns out that this candidate invariant is not inductive, and the check yields a counterexample to induction.
The counterexample is depicted in Fig. 6. The counterexample shows a propose action that leads to a violation of eq. 13. The example contains a single node (which forms a quorum), that has voted for value in round , and yet a different value is proposed for a later round (based on the quorum composed only of ) which leads to a violation of eq. 13. The propose action is enabled since holds. However, an arbitrary value is proposed since join_ack_msg is empty. The root cause for the counterexample is that the inductive invariant does not capture the connection between joined_round and join_ack_msg, so it allows a state in which a node has joined round according to joined_round (i.e., holds), but it has not joined it according to join_ack_msg (i.e., does not hold). Note that the counterexample is spurious, in the sense that it does not represent a reachable state. However, for a proof by an inductive invariant, we must eliminate this counterexample nonetheless.
Rewriting (steps 3+4): second attempt
One obvious way to eliminate the counterexample discussed above is to add the representation invariant of joined_round to the inductive invariant. However, this will result in a cyclic quantifier alternation, causing the verification condition to be outside of EPR. Instead, we will eliminate this counterexample by rewriting the code of the propose action, relying on an auxiliary invariant to verify the rewrite, as explained in Section 3.2. We observe that the mismatch between joined_round and join_ack_msg is only problematic in this example because node voted in . While the condition of the propose action is supposed to ensure that the max operation considers past votes of all nodes in the quorum, such a scenario where the joined_round is inconsistent with join_ack_msg makes it possible for the propose action to overlook past votes, which is the case in this counterexample. Our remedy is therefore to rewrite the max operation (which is implemented by an assume command, as explained before) to consider the votes directly by referring to the vote messages instead of the join-acknowledgment messages that report them. We first formally state the rewrite and then justify its correctness.
The key to the correctness of this change is that a join-acknowledgment message from node to round contains its maximal vote prior to round , and once the node sent this message, it will never vote in rounds smaller than . Therefore, while the original propose action considers the maximum over votes reflected by join-acknowledgment messages from a quorum, looking at the actual votes from the quorum in rounds prior to (as captured by the vote_msg relation) yields the same maximum.
Formally, we establish the rewrite condition of step 3 given by eq. 2 using an auxiliary invariant , defined as the conjunction of eqs. 11, 10, 9, 8, 6 and 5. This invariant captures the connection between join_ack_msg and vote_msg explained above. The invariant is inductive for the original model, and its verification condition is in EPR (the resulting quantifier alternation graph is acyclic). Second, we prove that under the assumption of and the condition (Fig. 3 section 5), the operation is equivalent to the operation (recall that both translate to assume’s according to eq. 3). This check is also in EPR. In conclusion, we are able to establish the rewrite condition using two EPR checks: one for proving , and one for proving eq. 2.
After the above rewrite, the conjunction of eqs. 13, 11, 10, 9, 8, 7, 6, 5 and 4 is still not an inductive invariant, due to a counterexample to induction in which a node has joined a higher round according to joined_round, but has not left a lower round according to left_round. As before, the counterexample is inconsistent with the representation invariants. However, this time the counterexample (and another similar one) can be eliminated by strengthening the inductive invariant with the following facts, which are implied by the representation invariants of joined_round and left_round:
Both are purely universally quantified and therefore do not affect the quantifier alternation graph.