Complexity and Information in Invariant Inference

10/27/2019 ∙ by Yotam M. Y. Feldman, et al. ∙ 0

This paper addresses the complexity of SAT-based invariant inference, a prominent approach to safety verification. We consider the problem of inferring an inductive invariant of polynomial length given a transition system and a safety property. We analyze the complexity of this problem in a black-box model, called the Hoare-query model, which is general enough to capture algorithms such as IC3/PDR and its variants. An algorithm in this model learns about the system's reachable states by querying the validity of Hoare triples. We show that in general an algorithm in the Hoare-query model requires an exponential number of queries. Our lower bound is information-theoretic and applies even to computationally unrestricted algorithms, showing that no choice of generalization from the partial information obtained in a polynomial number of Hoare queries can lead to an efficient invariant inference procedure in this class. We then show, for the first time, that approaches such as PDR, which utilize rich Hoare queries, can be exponentially more efficient than approaches such as ICE learning, which only utilize inductiveness checks of candidates. We do so by constructing a class of transition systems for which a simple version of PDR with a single frame infers invariants in a polynomial number of queries, whereas every algorithm using only inductiveness checks and counterexamples requires an exponential number of queries. Our results also shed light on connections and differences with the classical theory of exact concept learning with queries, and imply that learning from counterexamples to induction is harder than classical exact learning from labeled examples. This demonstrates that the convergence rate of Counterexample-Guided Inductive Synthesis depends on the form of counterexamples.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

The inference of inductive invariants is a fundamental technique in safety verification, and the focus of many works (e.g. McMillan, 2003; Bradley, 2011; Eén et al., 2011; Cousot and Cousot, 1977; Srivastava et al., 2013; Alur et al., 2015; Fedyukovich and Bodík, 2018; Dillig et al., 2013). The task is to find an assertion that holds in the initial states of the system, excludes all bad states, and is closed under transitions of the system, namely, the Hoare triple is valid, where denotes one step of the system. Such an overapproximates the set of reachable states and establishes their safety.

The advance of SAT-based reasoning has led to the development of successful algorithms inferring inductive invariants using SAT queries. A prominent example is IC3/PDR (Bradley, 2011; Eén et al., 2011), which has led to a significant improvement in the ability to verify realistic hardware systems. Recently, this algorithm has been extended and generalized to software systems (e.g. Hoder and Bjørner, 2012; Komuravelli et al., 2014; Bjørner and Gurfinkel, 2015; Karbyshev et al., 2015; Cimatti et al., 2014).

Successful SAT-based inference algorithms are typically tricky and employ many clever heuristics. This is in line with the inherent asymptotic complexity of invariant inference, which is hard even with access to a SAT solver 

(Lahiri and Qadeer, 2009). However, the practical success of inference algorithms calls for a more refined complexity analysis, with the objective of understanding the principles on which these algorithms are based. This paper studies the asymptotic complexity of SAT-based invariant inference through the decision problem of polynomial length inference in the black-box Hoare-query model, as we now explain.

Inference of polynomial-length CNF

Naturally, inference algorithms succeed when the invariant they infer is not too long. Therefore, this paper considers the complexity of inferring invariants of polynomial length. We follow the recent trend in invariant inference, advocated in (McMillan, 2003; Bradley, 2011), to search for invariants in rich syntactical forms, beyond those usually considered in template-based invariant inference (e.g. Jeannet et al., 2014; Colón et al., 2003; Sankaranarayanan et al., 2004; Srivastava and Gulwani, 2009; Srivastava et al., 2013; Alur et al., 2015), with the motivation of achieving generality of the verification method and potentially improving the success rate. We thus study the inference of invariants expressed in Conjunctive Normal Form (CNF) of polynomial length. Interestingly, our results also apply to inferring invariants in Disjunctive Normal Form.

The Hoare-query model

Our study of SAT-based methods focuses on an algorithmic model called the Hoare-query model. The idea is that the inference algorithm is not given direct access to the program, but performs queries on it. In the Hoare-query model, algorithms repeatedly choose and query for the validity of Hoare triples , where is the transition relation denoting one step of the system, inaccessible to the algorithm but via such Hoare queries. The check itself is implemented by an oracle, which in practice is a SAT solver. This model is general enough to capture algorithms such as PDR and its variants, and leaves room for other interesting design choices, but does not capture white-box approaches such as abstract interpretation (Cousot and Cousot, 1977). The advantage of this model for a theoretical study is that it enables an information-based analysis, which (i) sidesteps open computational complexity questions, and therefore results in unconditional lower bounds on the computational complexity of SAT-based algorithms captured by the model, and (ii) grants meaning to questions about generalization from partial information we discuss later.


This research addresses two main questions related to the core ideas behind PDR, and theoretically analyzes them in the context of the Hoare-query model:

  1. These algorithms revolve around the question of generalization: from observing concrete states (to be excluded from the invariant), the algorithm seeks to produce assertions that hold for all

    reachable states. The different heuristics in this context are largely understood as clever ways of performing this generalization. The situation is similar in interpolation-based algorithms, only that generalization is performed from bounded safety proofs rather than states. How should generalization be performed to achieve efficient invariant inference?

  2. A key aspect of PDR is the form of SAT checks it uses, as part of relative inductiveness checks, of Hoare triples in which in general .111For the PDR-savvy: is typically a candidate clause, and is derived from the previous frame. Repeated queries of this form are potentially richer than presenting a series of candidate invariants, where the check is . Is there a benefit in using relative inductiveness beyond inductiveness checks?

We analyze these questions in the foundational case of Boolean programs, which is applicable to infinite-state systems through predicate abstraction (Graf and Saïdi, 1997; Flanagan and Qadeer, 2002; Lahiri and Qadeer, 2009), and also a core part of other invariant inference techniques for infinite-state (e.g. Hoder and Bjørner, 2012; Komuravelli et al., 2014; Karbyshev et al., 2017).

In Section 6, we answer question 1 with an impossibility result, by showing that no choice of generalization can lead to an inference algorithm using only a polynomial number of Hoare queries. Our lower bound is information-theoretic, and holds even with unlimited computational power, showing that the problem of generalization is chiefly a question of information gathering.

In Section 7, we answer question 2 in the affirmative, by showing an exponential gap between algorithms utilizing rich checks and algorithms that perform only inductiveness checks

. Namely, we construct a class of programs for which a simple version of PDR can infer invariants efficiently, but every algorithm learning solely from counterexamples to the inductiveness of candidates requires an exponential number of queries. This result shows, for the first time theoretically, the significance of relative inductiveness checks as the foundation of PDR’s mechanisms, in comparison to a machine learning approach pioneered in the ICE model 

(Garg et al., 2014, 2016) that infers invariants based on inductiveness checks only.

Our results also clarify the relationship between the problem of invariant inference and the classical theory of exact concept learning with queries (Angluin, 1987). In particular, our results imply that learning from counterexamples to induction is harder than learning from positive & negative examples (Section 8), providing a formal justification to the existing intuition (Garg et al., 2014). This demonstrates that the convergence rate of learning in Counterexample-Guided Inductive Synthesis (e.g. Solar-Lezama et al., 2006; Jha et al., 2010; Jha and Seshia, 2017) depends on the form of examples. We also establish impossibility results for directly applying algorithms from concept learning to invariant inference.

The contributions of the paper are summarized as follows:

  • We define the problem of polynomial-length invariant inference, and show it is -complete (Section 4), strengthening the hardness result of template-based abstraction by Lahiri and Qadeer (2009).

  • We introduce the Hoare-query model, a black-box model of invariant inference capable of modeling PDR (Section 5), and study the query complexity of polynomial-length invariant inference in this model.

  • We show that in general an algorithm in this model requires an exponential number of queries to solve polynomial-length inference, even though Hoare queries are rich and versatile (Section 6).

  • We also extend this result to a model capturing interpolation-based algorithms (Section 6.2).

  • We show that Hoare queries are more powerful than inductiveness queries (Section 7). This also proves that ICE learning cannot model PDR, and that the extension of the model by Vizel et al. (2017) is necessary.

  • We prove that exact learning from counterexamples to induction is harder than exact learning from positive & negative examples, and derive impossibility results for translating some exact concept learning algorithms to the setting of invariant inference (Section 8).

2. Overview

Coming up with inductive invariants is one of the most challenging tasks of formal verification—it is often referred to as the “Eureka!” step. This paper studies the asymptotic complexity of automatically inferring CNF invariants of polynomial length, a problem we call polynomial-length inductive invariant inference, in a SAT-based black-box model.

Consider the dilemmas Abby faces when she attempts to develop an algorithm for this problem from first principles. Abby is excited about the popularity of SAT-based inference algorithms. Many such algorithms operate by repeatedly performing checks of Hoare triples of the form , where are a precondition and postcondition (resp.) chosen by the algorithm in each query and is the given transition relation (loop body). A SAT solver implements the check. We call such checks Hoare queries, and focus in this paper on black-box inference algorithms in the Hoare-query model: algorithms that access the transition relation solely through Hoare queries.

1init  2axiom  3 4function add-double(,) =  5 6while * 7   input  8   if : 9        := add-double(,) 10   if : 11            := add-double(,) 12   ... 13   if : 14        := add-double(,) 15   assert  Figure 1. An example propositional transition system for which we would like to infer an inductive invariant. The state is over . The variables are inputs and can change arbitrarily in each step. are immutable, with the assumption that exactly one is true.

Fig. 1 displays one example program that Abby is interested in inferring an inductive invariant for. In this program, a number , represented by bits, is initialized to zero, and at each iteration incremented by an even number that is decided by the input variables (all computations are mod ). The representation of the number using the bits is determined by another set of bits , which are all immutable, and only one of them is true: if , the number is represented by , if the least-significant bit (lsb) shifts and the representation is and so on. The safety property is that is never equal to the number with all bits 1. Intuitively, this holds because the number is always even. An inductive invariant states this fact, taking into account the differing representations, by stating that the lsb (as chosen by ) is always 0: Of course, Abby aims to verify many systems, of which Fig. 1 is but one example.

2.1. Example: Backward-Reachability with Generalization

Algorithm 1 Backward-reachability 1:procedure Block-Cube() 2:       3:      while  not valid do 4:             5:             Block(,) 6:                   return Algorithm 2 Naive Block 1:procedure Block-Cube(,) 2:      return
1:procedure Block-PDR-1()
3:      for  do
5:            if  then
6:                                    return
Algorithm 3 Generalization with Init-Step Reachability

How should Abby’s algorithm go about finding inductive invariants? One known strategy is that of backward reachability, in which the invariant is strengthened to exclude states from which bad states may be reachable.222Our results are not specific to backward-reachability algorithms; we use them here for motivation and illustration. Alg. 1 is an algorithmic backward-reachability scheme: it repeatedly checks for the existence of a counterexample to induction (a transition of from to ), and strengthens the invariant to exclude the pre-state using the formula Block returns.

Alg. 1 depends on the choice of Block. The most basic approach is of Alg. 2, which excludes exactly the pre-state, by conjoining to the invariant the negation of the cube of (the cube is the conjunction of all literals that hold in the state; the only state that satisfies is itself, and thus the only one to be excluded from in this approach). For example, when Alg. 1 needs to block the state (this state reaches the bad state ), Alg. 2 does so by conjoining to the invariant the negation of , and this is a formula that other states do not satisfy.

Alas, Alg. 1 with blocking by Alg. 2 is not efficient. In essence it operates by enumerating and excluding the states backward-reachable from bad. The number of such states is potentially exponential, making Alg. 2 unsatisfactory. For instance, the example of Fig. 1 requires the exclusion of all states in which

is odd for every choice of lsb, a number of states exponential in

. The algorithm would thus require an exponential number of queries to arrive at a (CNF) inductive invariant, even though a CNF invariant with only clauses exists (as above).

Efficient inference hence requires Abby to exclude more than a single state at each time, namely, to generalize from a counterexample—as real algorithms do. What generalization strategy could Abby choose that would lead to efficient invariant inference?

2.2. All Generalizations are Wrong

One simple generalization strategy Abby considers appears in Alg. 3, based on the standard ideas in IC3/PDR (Bradley, 2011; Eén et al., 2011) and subsequent developments (e.g. Hoder and Bjørner, 2012; Komuravelli et al., 2014). It starts with the cube (as Alg. 2) and attempts to drop literals, resulting in a smaller conjunction, which many states satisfy; all these states are excluded from the candidate in creftype 6 of Alg. 1. Hence with this generalization Alg. 1 can exclude many states in each iteration, overcoming the problem with the naive algorithm above. Alg. 3 chooses to drop a literal from the conjunction if no state reachable in at most one step from Init satisfies the conjunction even when that literal is omitted (creftype 4 of Alg. 3); we refer to this algorithm as PDR-1, since it resembles PDR with a single frame.

For example, when in the example of Fig. 1 the algorithm attempts to block the state with , Alg. 3 minimizes the cube to , because no state reachable in at most one step satisfies , but this is no longer true when another literal is omitted. Conjoining the invariant with (in creftype 6 of Alg. 1) produces a clause of the invariant, . In fact, our results show that PDR-1 finds the aforementioned invariant in queries.

Yet there is a risk in over-generalization, that is, of dropping too many literals and excluding too many states. In Alg. 1, generalization must not return a formula that some reachable states satisfy, or the candidate would exclude reachable states and would not be an inductive invariant. Alg. 3 chooses to take the strongest conjunction that does not exclude any state reachable in at most one step; it is of course possible (and plausible) that some states are reachable in two steps but not in one. Alg. 1 with the generalization in Alg. 3 might fail in such cases.

The necessity of generalization, on the one hand, and the problem of over-generalization on the other leads in practice to complex heuristic techniques. Instead of simple backward-reachability with generalization per Alg. 1, PDR never commits to a particular generalization (Eén et al., 2011) through a sequence of frames, which are (in some sense) a sequence of candidate invariants. The clauses resulting from generalization are used to strengthen frames according to a bounded reachability analysis; Alg. 3 corresponds to generalization in the first frame.

Overall, the study of backward-reachability and the PDR-1 generalization leaves us with the question: Is there a choice of generalization that can be used—in any way—to achieve an efficient invariant inference algorithm?

In a non-interesting way, the answer is yes, there is a “good” way to generalize: Use Alg. 1, with the following generalization strategy: Upon blocking a pre-state , compute an inductive invariant of polynomial length, and return the clause of the invariant that excludes ,333Such a clause exists because is backward-reachable from bad states, and thus excluded from the invariant. and this terminates in a polynomial number of steps.

Such generalization is clearly unattainable. It requires (1) perfect information of the transition system, and (2) solving a computationally hard problem, since we show that polynomial-length inference is -hard (Thm. 4.2). What happens when generalization is computationally unbounded (an arbitrary function), but operates based on partial information of the transition system? Is there a generalization from partial information, be it computationally intractable, that facilitates efficient inference? If such a generalization exists we may wish to view invariant inference heuristics as approximating it in a computationally efficient way.

Similar questions arise in interpolation-based algorithms, only that generalization is performed not from a concrete state, but from a bounded unreachability proof. Still it is challenging to generalize enough to make progress but not too much as to exclude reachable states (or include states from which bad is reachable).

2.2.1. Our results

Our first main result in this paper is that in general, there does not exist a generalization scheme from partial information leading to efficient inference based on Hoare queries. Technically, we prove that even a computationally unrestricted generalization from information gathered from Hoare queries requires an exponential number of queries. This result applies to any generalization strategy and any algorithm using it that can be modeled using Hoare queries, including Alg. 1 as well as more complex algorithms such as PDR. We also extend this lower bound to a model capturing interpolation-based algorithms (Thm. 6.6).

These results are surprising because a-priori it would seem possible, using unrestricted computational power, to devise queries that repeatedly halve the search space, yielding an invariant with a polynomial number of queries (the number of candidates is only exponential because we are interested in invariants up to polynomial length). We show that this is impossible to achieve using Hoare queries.

2.3. Inference Using Rich Queries

So far we have established strong impossibility results for invariant inference based on Hoare queries in the general case, even with computationally unrestricted generalization. We now turn to shed some light on the techniques that inference algorithms such as PDR employ in practice. One of the fundamental principles of PDR is the incremental construction of invariants relying on rich Hoare queries. PDR-1 demonstrates a simplified realization of this principle. When PDR-1 considers a clause to strengthen the invariant, it checks the reachability of that individual clause from Init, rather than the invariant as a whole. This is the Hoare query in creftype 4 of Alg. 3, in which, crucially, the precondition is different from the postcondition. The full-fledged PDR is similar in this regard, strengthening a frame according to reachability from the previous frame via relative induction checks (Bradley, 2011).

The algorithm in Alg. 2 is fundamentally different, and uses only inductiveness queries , a specific form of Hoare queries where the precondition and postcondition are the same. Algorithms performing only inductiveness checks can in fact be very sophisticated, traversing the domain of candidates in clever ways. This approach was formulated in the ICE learning framework for learning inductive invariants (Garg et al., 2014, 2016) (later extended to general Constrained-Horn Clauses (Ezudheen et al., 2018)), in which algorithms present new candidates based on positive, negative, and implication examples returned by a “teacher” in response to incorrect candidate invariants.444Our formulation focuses on implication examples—counterexamples to inductiveness queries—and strengthens the algorithm with full information about the set of initial and bad states instead of positive and negative examples (resp.). The main point is that such algorithms do not perform queries other than inductiveness, and choose the next candidate invariant based solely on the counterexamples to induction showing the previous candidates were unsuitable.

The contrast between the two approaches raises the question: Is there a benefit to invariant inference in Hoare queries richer than inductiveness? For instance, to model PDR in the ICE framework, Vizel et al. (2017) extended the framework with relative inductiveness checks, but the question whether such an extension is necessary remained open.

2.3.1. Our results

Our second significant result in this paper is showing an exponential gap between the general Hoare-query model and the more specific inductiveness-query model. To this end, we construct a class of transition systems, including the example of Fig. 1, for which (1) PDR-1, which is a Hoare-query algorithm, infers an invariant in a polynomial number of queries, but (2) everyinductiveness-query algorithm requires an exponential number of queries, that is, an exponential number of candidates before it finds a correct inductive invariant. This demonstrates that analyzing the reachability of clauses separately can offer an exponential advantage in certain cases. This also proves that PDR cannot be cast in the ICE framework, and that the extension by Vizel et al. (2017) is necessary and strictly increases the power of inference with a polynomial number of queries. To the best of our knowledge, this is not only the first lower bound on ICE learning demonstrating such an exponential gap (also see the discussion in Section 9), but also the first polynomial upper bound on PDR for a class of systems.

We show this separation on a class of systems constructed using a technical notion of maximal systems for monotone invariants. These are systems for which there exists a monotone invariant (namely, an invariant propositional variables appear only negatively) with a linear number of clauses, and the transition relation includes all transitions allowed by this invariant. The system in Fig. 1 is an example of a maximal system: it allows every transition between states satisfying the invariant (namely, between all even ’s with the same representation), and also every transition between states violating the invariant (namely, between all odd ’s with the same representation).555Transitions violating the axiom or modifying it are excluded in this modeling. The success of PDR-1 relies on the small diameter (every reachable state is reachable in one step) and the fact that the invariant is monotone. However, we show that for inductiveness-query algorithms this class is as hard as the class of all programs admitting monotone invariants, whose hardness is established from the results of Section 2.2.1. For example, from the perspective of inductiveness-query algorithms, the example of Fig. 1, which is a maximal program as explained above, is as hard as any system that admits its invariant (and also respects the axiom and leaves unchanged). This is because an inductiveness-query algorithm can only benefit from having fewer transitions and hence fewer counterexamples to induction, whereas maximal programs include as many transitions as possible. If an inductiveness query algorithm is to infer an invariant for the example of Fig. 1, it must also be able to infer an invariant for all systems whose transitions are a subset of the transitions of this example. This includes systems with an exponential diameter, as well as systems admitting other invariants, potentially exponentially long. This program illustrates our lower bound construction, which takes all maximal programs for monotone-CNF invariants.

In our lower bound we follow the existing literature on the analysis of inductiveness-query algorithms, which focuses on the worst-case notion w.r.t. potential examples (strong convergence in Garg et al. (2014)). An interesting direction is to analyze inductiveness-query algorithms that exercise some control over the choice of counterexamples to induction, or under probabilistic assumptions on the distribution of examples.

2.4. A Different Perspective: Exact Learning of Invariants with Hoare Queries

This paper can be viewed as developing a theory of exact learning of inductive invariants with Hoare queries, akin to the classical theory of concept learning with queries (Angluin, 1987). The results outlined above are consequences of natural questions about this model: The impossibility of generalization from partial information (Section 2.2.1) stems from an exponential lower bound on the Hoare-query model. The power of rich Hoare queries (Section 2.3.1) is demonstrated by an exponential separation between the Hoare- and inductiveness-query models, in the spirit of the gap between concept learning using both equivalence and membership queries and concept learning using equivalence queries alone (Angluin, 1990).

The similarity between invariant inference (and synthesis in general) and exact concept learning has been observed before (e.g. Jha et al., 2010; Garg et al., 2014; Jha and Seshia, 2017; Alur et al., 2015; Bshouty et al., 2017). Our work highlights some interesting differences and connections between invariant learning with Hoare, and concept learning with equivalence and membership queries. This comparison yields (im)possibility results for translating algorithms from concept learning with queries to invariant inference with queries. Another outcome is the third significant result of this paper: a proof that learning from counterexamples to induction is inherently harder than learning from examples labeled as positive or negative, formally corroborating the intuition advocated by Garg et al. (2014). More broadly, the complexity difference between learning from labeled examples and learning from counterexamples to induction demonstrates that the convergence rate of learning in Counterexample-Guided Inductive Synthesis (e.g. Jha et al., 2010; Jha and Seshia, 2017) depends on the form of examples. The proof of this result builds on the lower bounds discussed earlier, and is discussed in Section 8.666 It may also be interesting to note that one potential difference between classical learning and invariant inference, mentioned by Löding et al. (2016), does not seem to manifest in the results discussed in Section 2.2.1: the transition systems in the lower bound for inductiveness queries in Corollary 7.11 have a unique inductive invariant, and still the problem is hard.

3. Background

3.1. States, Transitions Systems, and Inductive Invariants

In this paper we consider safety problems defined via formulas in propositional logic. Given a propositional vocabulary that consists of a finite set of Boolean variables, we denote by the set of well formed propositional formulas defined over . A state is a valuation to . For a state , the cube of , denoted , is the conjunction of all literals that hold in . A transition system is a triple such that define the initial states and the bad states, respectively, and defines the transition relation, where is a copy of the vocabulary used to describe the post-state of a transition. A class of transition systems, denoted , is a set of transition systems. A transition system TS is safe if all the states that are reachable from the initial states via steps of satisfy . An inductive invariant for TS is a formula such that , , and , where denotes the result of substituting each for in , and denotes the validity of the formula . In the context of propositional logic, a transition system is safe if and only if it has an inductive invariant. When is not inductive, a counterexample to induction is a pair of states such that (where the valuation to is taken from ).

The classes , and

is the set of propositional formulas in Conjunctive Normal Form (CNF) with at most clauses (disjunction of literals). is likewise for Disjunctive Normal Form (DNF), where is the maximal number of cubes (conjunctions of literals). is the subset of in which all literals are negative.

3.2. Invariant Inference Algorithms

In this section we briefly provide background on inference algorithms that motivate our theoretical development in this paper. The main results of the paper do not depend on familiarity with these algorithms or their details; this (necessarily incomprehensive) “inference landscape” is presented here for context and motivation for defining the Hoare-query model (Section 5), studying its complexity and the feasibility of generalization (Section 6), and analyzing the power of Hoare queries compared to inductiveness queries (Section 7). We allude to specific algorithms in motivating each of these sections.


IC3/PDR maintains a sequence of formulas , called frames, each of which can be understood as a candidate inductive invariant. The sequence is gradually modified and extended throughout the algorithm’s run. It is maintained as an approximate reachability sequence, meaning that (1) , (2) , (3) , and (4) . These properties ensure that overapproximates the set of states reachable in steps, and that the approximations contain no bad states. (We emphasize that does not imply that a bad state is unreachable in any number of states.) The algorithm terminates when one of the frames implies its preceding frame (), in which case it constitutes an inductive invariant, or when a counterexample trace is found. In iteration , a new frame is added to the sequence. One way of doing so is by initializing to true, and strengthening it until it excludes all bad states. Strengthening is done by blocking bad states: given a bad state , the algorithm strengthens to exclude all ’s pre-states—states that satisfy —one by one (thereby demonstrating that is unreachable in steps). Blocking a pre-state from frame is performed by a recursive call to block its own pre-states from frame , and so on. If this process reaches a state from Init, the sequence of states from the recursive calls constitutes a trace reaching Bad from Init, which is a counterexample to safety. Alternatively, when a state is successfully found to be unreachable from in one step, i.e., is unsatisfiable, frame is strengthened to reflect this fact. Aiming for efficient convergence (see Section 2.1), PDR chooses to generalize, and exclude more states. A basic form of generalization is performed by dropping literals from as long as the result is still unreachable from , i.e., is still unsatisfiable. This is very similar to PDR-1 above (Section 2.2), where was always . Often inductive generalization is used, dropping literals as long as , reading that is inductive relative to , which can drop more literals than basic generalization. A core optimization of PDR is pushing, in which a frame is “opportunistically” strengthened with a clause from , if is already sufficiently strong to show that is unreachable in .

For a more complete presentation of PDR and its variants as a set of abstract rules that may be applied nondeterministically see e.g. Hoder and Bjørner (2012); Gurfinkel and Ivrii (2015). The key point from the perspective of this paper is that the algorithm and its variants access the transition relation in a very specific way, checking whether some is unreachable in one step of from the set of states satisfying a formula (or those satisfying ), and obtains a counterexample when it is reachable (see also Vizel et al. (2017)). Crucially, other operations (e.g., maintaining the frames, checking whether , etc.) do not use . We will return to this point when discussing the Hoare-query model, which can capture IC3/PDR (Section 5).


The ICE framework (Garg et al., 2014, 2016) (later extended to general Constrained-Horn Clauses (Ezudheen et al., 2018)), is a learning framework for inferring invariants from positive, negative and implication counterexamples. We now review the framework using the original terminology and notation; later in the paper we will use a related formulation that emphasizes the choice of candidates (in Section 7.1).

In ICE learning, the teacher holds an unknown target , where are sets of examples. The learner’s goal is to find a hypothesis s.t. , and for each , . The natural way to cast inference in this framework is, given a transition system and a set of candidate invariants , to take as the set of program states, a set of reachable states including Init, a set of states including Bad from which a safety violation is reachable, the set of transitions of , and . Iterative ICE learning operates in rounds. In each round, the learner is provided with a sample— s.t. —and outputs an hypothesis . The teacher returns that the hypothesis is correct, or extends the sample with an example showing that is incorrect. The importance of implication counterexamples is that they allow implementing a teacher using a SAT/SMT solver without “guessing” what a counterexample to induction indicates (Garg et al., 2014; Löding et al., 2016). Examples of ICE learning algorithms include Houdini (Flanagan and Leino, 2001) and symbolic abstraction (Reps et al., 2004; Thakur et al., 2015), as well as designated algorithms (Garg et al., 2014, 2016). Theoretically, the analysis of Garg et al. (2014) focuses on strong convergence of the learner, namely, that the learner can always reach a correct concept, no matter how the teacher chooses to extend samples between rounds. In this work, we will be interested in the number of rounds the learner performs. We will say that the learner is strongly-convergent with round-complexity if for every ICE teacher, the learner finds a correct hypothesis in at most rounds, provided that one exists. We extend this definition to a class of target descriptions in the natural way.


The idea of interpolation-based algorithms, first introduced by McMillan (2003), is to generalize proofs of bounded unreachability into elements of a proof of unbounded reachability, utilizing Craig interpolation. Briefly, this works as follows: encode a bounded reachability from a set of states in steps, and use a SAT solver to find that this cannot reach Bad. When efficient interpolation is supported in the logic and solver, the SAT solver can produce an interpolant : a formula representing a set of states that (i) overapproximates the set of states reachable from in steps, and still (ii) cannot reach Bad in steps (any choice is possible). Thus overapproximates concrete reachability from without reaching a bad state, although both these facts are known in only a bounded number of steps. The hope is that would be a useful generalization to include as part of the invariant. The original algorithm (McMillan, 2003) sets some as the current unrolling bound, starts with , obtains an interpolant with , sets and continues in this fashion, until an inductive invariant is found, or Bad becomes reachable in steps from , in which case is incremented and the algorithm is restarted. The use of interpolation and generalization from bounded unreachability has been used in many works since (e.g. Vizel and Grumberg, 2009; McMillan, 2006; Jhala and McMillan, 2007; Henzinger et al., 2004; Vizel et al., 2013). Combining ideas from interpolation and PDR has also been studied (e.g. Vizel and Gurfinkel, 2014). The important point for this paper is that many interpolation-based algorithms only access the transition relation when checking bounded reachability (from some set of states to some set of states ), and extracting interpolants when the result is unreachable. We will return to this point when discussing the interpolation-query model, which aims to capture interpolation-based algorithms (Section 6.2).

4. Polynomial-Length Invariant Inference

In this section we formally define the problem of polynomial-length invariant inference for CNF formulas, which is the focus of this paper. We then relate the problem to the problem of inferring DNF formulas with polynomially many cubes via duality (see Appendix A), and focus on the case of CNF in the rest of the paper.

Our object of study is the problem of polynomial-length inference:

Definition 4.1 (Polynomial-Length Inductive Invariant Inference).

The polynomial-length inductive invariant inference problem (invariant inference for short) for a class of transition systems and a polynomial is the problem: Given a transition system over , decide whether there exists an inductive invariant for TS, where .


In the sequel, when considering the polynomial-length inductive invariant inference problem of a transition system , we denote by the vocabulary of and . Further, we denote .


The complexity of polynomial-length inference is measured in . Note that the invariants are required to be polynomial in .

is a rich class of invariants. Inference in more restricted classes can be solved efficiently. For example, when only conjunctive candidate invariants are considered, and is the set of all propositional transition systems, the problem can be decided in a polynomial number of SAT queries through the Houdini algorithm (Flanagan and Leino, 2001; Lahiri and Qadeer, 2009). Similar results hold also for CNF formulas with a constant number of literals per clause (by defining a new predicate for each of the polynomially-many possible clauses and applying Houdini), and for CNF formulas with a constant number of clauses (by translating them to DNF formulas with a constant number of literals per cube and applying the dual procedure). However, a restricted class of invariants may miss invariants for some programs and reduces the generality of the verification procedure. Hence in this paper we are interested in the richer class of polynomially-long CNF invariants. In this case the problem is no longer tractable even with a SAT solver:

Theorem 4.2 ().

Let be the set of all propositional transition systems. Then polynomial-length inference for is -complete, where is the second level of the polynomial-time hierarchy.

We defer the proof to Section 6.1.1.

We note that polynomial-length inference can be encoded as specific instances of template-based inference; the -hardness proof of Lahiri and Qadeer (2009) uses more general templates and therefore does not directly imply the -hardness of polynomial-length inference. Lower bounds on polynomial-length inference entail lower bounds for template-based inference.

Remark 4.1 ().

In the above formulation, an efficient procedure for deciding safety does not imply polynomial-length inference is tractable, since the program may be safe, but all inductive invariants may be too long. To overcome this technical quirk, we can consider a promise problem (Goldreich, 2006) variant of polynomial-length inference:

Given a transition system ,

  • (Completeness) If TS has an inductive invariant , the algorithm must return yes.

  • (Soundness) If TS is not safe the algorithm must return no.

Other cases, including the case of safety with an invariant outside , are not constrained. An algorithm deciding safety thus solves also this problem. All the results of this paper apply both to the standard version above and the promise problem: upper bounds on the standard version trivially imply upper bounds on the promise problem, and in our lower bounds we use transition systems that are either (i) safe and have an invariant in , or (ii) unsafe.

5. Invariant Inference with Queries and the Hoare Query Model

In this paper we study algorithms for polynomial-length inference through black-box models of inference with queries. In this setting, the algorithm accesses the transition relation through (rich) queries, but cannot read the transition relation directly. Our main model is of Hoare-query algorithms, which query the validity of a postcondition from a precondition in one step of the system. Hoare-query algorithms faithfully capture a large class of SAT-based invariant inference algorithms, including PDR and related methods.

A black-box model of inference algorithms facilitates an analysis of the information of the transition relation the algorithm acquires. The advantage is that such an information-based analysis sidesteps open computational complexity questions, and therefore results in unconditional lower bounds on the computational complexity of SAT-based algorithms captured by the model. Such an information-based analysis is also necessary for questions involving unbounded computational power and restricted information, in the context of computationally-unrestricted bounded-reachability generalization (see Section 6.3).

In this section we define the basic notions of queries and query-based inference algorithms. We also define the primary query model we study in the paper: the Hoare-query model. In the subsequent sections we introduce and study additional query models: the interpolation-query model (Section 6.2), and the inductiveness-query model (Section 7.1).

Inference with queries

We model queries of the transition relation in the following way: A query oracle is an oracle that accepts a transition relation , as well as additional inputs, and returns some output. The additional inputs and the output, together also called the interface of the oracle, depend on the query oracle under consideration. A family of query oracles is a set of query oracles with the same interface. We consider several different query oracles, representing different ways of obtaining information about the transition relation.

Definition 5.1 (Inference algorithm in the query model).

An inference algorithm from queries, denoted , is defined w.r.t. a query oracle and is given:

  • access to the query oracle ,

  • the set of initial states (Init) and bad states (Bad);

  • the transition relation , encapsulated—hence the notation —meaning that the algorithm cannot access (not even read it) except for extracting its vocabulary; can only be passed as an argument to the query oracle .

solves the problem of polynomial-length invariant inference for .

The Hoare-query model

Our main object of study in this paper is the Hoare-query model of invariant inference algorithms. It captures SAT-based invariant inference algorithms querying the behavior of a single step of the transition relation at a time.

Definition 5.2 (Hoare-Query Model).

For a transition relation and input formulas , the Hoare-query oracle, , returns false if ; otherwise it returns true.

An algorithm in the Hoare-query model, also called a Hoare-query algorithm, is an inference from queries algorithm expecting the Hoare query oracle.

Intuitively, a Hoare-query algorithm gains access to the transition relation, , exclusively by repeatedly choosing , and calling .

If we are using a SAT solver to compute the Hoare-query, , then when the answer is false, the SAT solver will also produce a counterexample pair of states such that .

We observe that using binary search, a Hoare-query algorithm can do the same:

Lemma 5.3 ().

Whenever , a Hoare-query algorithm can find such that using Hoare queries.


For each , if , conjoin it to , else to , and check whether is still false. If it is, continue to ; otherwise flip and continue to . ∎

Example: PDR as a Hoare-query algorithm

The Hoare-query model captures the prominent PDR algorithm, facilitating its theoretical analysis. As discussed in Section 3.2, PDR accesses the transition relation via checks of unreachability in one step and counterexamples to those checks. These operations are captured in Hoare query model by checking or (for the algorithm’s choice of ), and obtaining a counterexample using a polynomial number of Hoare queries, if one exists (Lemma 5.3). Furthemore, the Hoare-query model is general enough to express a broad range of PDR variants that differ in the way they use such checks but still access the transition relation only through such queries.

The Hoare-query model is not specific to PDR. It also captures algorithms in the ICE learning model (Garg et al., 2014), as we discuss in Section 7.1, and as result can model algorithms captured by the ICE model (see Section 3.2). In Section 7.2 we show that the Hoare-query model is in fact strictly more powerful than the ICE model.

Remark 5.1 ().

Previous black-box models for invariant inference encapsulated access also to  (Garg et al., 2014). In our model we encapsulate only access to , since (1) it is technically simpler, (2) a simple transformation can make uniform across all programs, embedding the differences in the transition relation; indeed, our constructions of classes of transition systems in this paper are such that are the same in all transition systems that share a vocabulary, hence may be inferred from the vocabulary. (Unrestricted access to is stronger, thus lower bounds on our models apply also to models restricting access.)


Focusing on information, we do not impose computational restrictions on the algorithms, and only count the number of queries the algorithm performs to reveal information of the transition relation. In particular, when establishing lower bounds on the query complexity, we even consider algorithms that may compute non-computable functions. However, whenever we construct algorithms demonstrating upper bounds on query complexity, these algorithms in fact have polynomial time complexity, and we note this when relevant.

Given a query oracle and an inference algorithm that uses it, we analyze the number of queries the algorithm performs as a function of , in a worst-case model w.r.t. to possible transition systems over in the class of interest.

The definition is slightly more complicated by considering, as we do later in the paper, query-models in which more than one oracle exists, i.e., an algorithm may use any oracle from a family of query oracles. In this case, we analyze the query complexity of an algorithm in a worst-case model w.r.t. the possible query oracles in the family as well.

Formally, the query complexity is defined as follows:

Definition 5.4 (Query Complexity).

For a class of transitions systems , the query complexity of (a possibly computationally unrestricted) w.r.t. a query oracle family is defined as


where is the number of times the algorithm accesses given this oracle and the input. (These numbers might be infinite.)

The query complexity in the Hoare-query model is .

Remark 5.2 ().

In our definition, query complexity is a function of the size of the vocabulary , but not of the size of the representation of the transition relation . This reflects the fact that an algorithm in the black-box model does not access directly. In Appendix B we discuss the complexity w.r.t.  as well. The drawback of such a complexity measure is that learning itself becomes feasible, undermining the black-box model. Efficiently learning is possible when using unlimited computational power and exponentially-long queries. However, whether the same holds when using unlimited computational power with only polynomially-long queries is related to open problems in classical concept learning.

6. The Information Complexity of Hoare-Query Algorithms

In this section we prove an information-based lower bound on Hoare-query invariant inference algorithms, and also extend the results to algorithms using interpolation, another SAT-based operation. We then apply these results to study the role of information in generalization as part of inference algorithms.

6.1. Information Lower Bound for Hoare-Query Inference

We show that a Hoare-query inference algorithm requires Hoare-queries in the worst case to decide whether a CNF invariant of length polynomial in exists. (Recall that is a shorthand for , the size of the vocabulary of the input transition system.) This result applies even when allowing the choice of queries to be inefficient, and when allowing the queries to use exponentially-long formulas. It provides a lower bound on the time complexity of actual algorithms, such as PDR, that are captured by the model. Formally:

Theorem 6.1 ().

Every Hoare-query inference algorithm deciding polynomial-length inference for the class of all propositional transition systems has query complexity of .

The rest of this section proves a strengthening of this theorem, for a specific class of transition systems (which we construct next), for any class of invariants that includes monotone CNF, and for computationally-unrestricted algorithms:

Theorem 6.2 ().

Every Hoare-query inference algorithm , even computationally-unrestricted, deciding invariant inference for the class of transition systems (Section 6.1.1) and for any class of target invariants s.t. 777Here we extend the definition of polynomial-length invariant inference to instead of . has query complexity of .

(That classes containing are already hard becomes important in Section 7.)

6.1.1. A Hard Class of Transition Systems

In this section we construct a , a hard class of transition systems, on which we prove hardness results.

The problem

The construction of follows the -complete problem of from classical computational complexity theory. In this problem, the input is a quantified Boolean formula where is a Boolean (quantifier-free) formula, and the problem of is to decide whether the quantified formula is true, namely, there exists a Boolean assignment to s.t.  is true for every Boolean assignment to .

The class .

For each , we define . Finally, .

Let . For each formula , where , are variables and is a quantifier-free formula over the variables , we define a transition system . Intuitively, it iterates through lexicographically, and for each it iterates lexicographically through and checks if all assignments to satisfy . If no such is found, this is an error. More formally,

  1. .

  2. .

  3. .

  4. : evaluate , and perform the following changes (at a single step): If the result is false, set to true. If and is still false, set to false. If in the pre-state , increment lexicographically, reset to false, and set ; otherwise increment lexicographically. If in the pre-state , set to . (Intuitively, is false as long as no falsifying assignment to has been encountered for the current , is true as long as we have not yet encountered a for which there is no falsifying assignment.)

We denote the resulting class of transition systems .

The following lemma relates the problem for to the inference problem of :

Lemma 6.3 ().

Let . Then is safe iff it has an inductive invariant in iff the formula is true.


There are two cases:

  • If is true, let be the first valuation for that realizes the existential quantifiers. Then the following is an inductive invariant for :


    where the lexicographic constraint is expressed by the following recursive definition on :

    and (or true if ).

    : Note that can be written in CNF with at most clauses: in the first case a literal is added to each clause, and in the second another clause is added. Thus can be written in CNF with at most clauses. Further, the literals of appear only negatively in , and hence also in . The other literals () also appear only negatively in . Hence, is monotone.

    is indeed an inductive invariant: initiation and safety are straightforward. For consecution, consider a valuation to in a pre-state satisfying the invariant. (We abuse notation and refer to the valuation by .) There are three cases:

    • If , then (i) is retained by a step, (ii) holds after a step, (iii) still holds unless the transition is from the last evaluation for to , in which case is turned to false.

    • If , the invariant guarantees that in the pre-state is false, and thus remains false after a step. also remains false and thus the rest of the invariant also holds in the post-state.

    • If , the invariant guarantees that in the pre-state either is false or is false. If is false the same reasoning of the previous case applies. Otherwise, we have that is false. By the definition of all valuations for results in , so remains false after a step, and once we finish iterating through we set to false immediately.

    The claim follows.

  • If is not true, then is not safe (and thus does not have an inductive invariant of any length). This is because for every valuation of a violating is found, turning to true, and never turns to false, so after iterating through all possible ’s will become true.

Before we turn to prove Thm. 6.2 and establish a lower bound on the query complexity in the Hoare model, we note that this construction also yields the computational hardness mentioned in Section 4:

Proof of Thm. 4.2.

The upper bound is straightforward: guess an invariant in and check it. For the lower bound, use the reduction outlined above: given , construct . Note that the vocabulary size, , is , and the invariant, when exists, is of length at most .888For an arbitrary polynomial , e.g., with , enlarge , e.g., by adding to Init initialization of fresh variables that are not used elsewhere, to ensure existence of an invariant of length . The reduction is polynomial as the construction of (and ) from is polynomial in and : note that lexicographic incrementation can be performed with a propositional formula of polynomial size. ∎

6.1.2. Lower Bound’s Proof

We now turn to prove Thm. 6.2. Given an algorithm with polynomial query complexity, the proof constructs two transition system: one that has a polynomial-length invariant and one that does not, and yet all the queries the algorithm performs do not distinguish between them. The construction uses the path the algorithm takes when all Hoare queries return false as much as possible. Intuitively, such responses are less informative and rule out less transition relations, because they merely indicate the existence of a single counterexample to a Hoare triple, opposed to the result true which indicates that all transitions satisfy a property.

Proof of Thm. 6.2.

Let be a computationally unbounded Hoare-query algorithm. We show that the number of Hoare queries performed by on transition systems from with is at least . To this end, we show that if over performs less than queries, then there exist two formulas over such that

  • all the Hoare queries performed by on and (the transition relations of and , respectively) return the same result, even though

  • should return different results when run on and , since has an invariant in and does not have an invariant (of any length).

We begin with some notation. Running on input , we abbreviate by . Denote the queries performs and their results by . We call an index sat if . We say that query-agrees with if for all . We say that sat-query-agrees with if for every such that it holds that .

We first find a formula over such that the sequence of queries performs when executing on is maximally satisfiable: if sat-query-agrees with , then (completely) query-agrees with on the queries, that is,


We construct this sequence iteratively (and define accordingly) by always taking so that the result of the next query is false as long as this is possible while remaining consistent with the results of the previous queries: Initially, choose some arbitrary . At each point , consider the first queries performs on , . If terminates without performing another query, we are done: the desired is . Otherwise let be the next query. Amongst formulas that query-agree on the first queries, namely, for all , choose one such that if possible; if such does not exist take e.g. . The dependency of on is solely through the results of the queries to , so performs the same initial queries when given . The result is a maximally satisfiable sequence, for if a formula differs in query in which the result is false instead of true we would have taken such a as .

Let be such a formula with a maximally satisfiable sequence of queries performs on , . For every sat , take a counterexample . The single transition of depends on the value of on at most one assignment to , so there exists a valuation such that


as well. It follows that


Let be the valuations derived from the sat queries (concerning indexing, iff for some ). We say that a formula valuation-agrees with on if for all ’s. Since the sequence of queries is maximally satisfiable, if valuation-agrees with on then query-agrees with , namely, for all . As the dependency of on is solely through the results , it follows that performs the same queries on as it does on and returns the same answer.

It remains to argue that if then there exist two formulas that valuation-agree with on but differ in the correct result should return: is true, and so has an invariant in (Lemma 6.3), whereas is not, and so does not have an invariant of any length or form (Lemma 6.3). This is possible because the number of constraints imposed by valuation-agreeing with on is less than the number of possible valuations of for every valuation of and vice versa:


is true on all valuations except for some of , and since there exists some such that for all , is not one of these valuations (recall that bits). Dually,


is false on all valuations except for some of , and since for every