1. Introduction
The inference of inductive invariants is a fundamental technique in safety verification, and the focus of many works (e.g. McMillan, 2003; Bradley, 2011; Eén et al., 2011; Cousot and Cousot, 1977; Srivastava et al., 2013; Alur et al., 2015; Fedyukovich and Bodík, 2018; Dillig et al., 2013). The task is to find an assertion that holds in the initial states of the system, excludes all bad states, and is closed under transitions of the system, namely, the Hoare triple is valid, where denotes one step of the system. Such an overapproximates the set of reachable states and establishes their safety.
The advance of SATbased reasoning has led to the development of successful algorithms inferring inductive invariants using SAT queries. A prominent example is IC3/PDR (Bradley, 2011; Eén et al., 2011), which has led to a significant improvement in the ability to verify realistic hardware systems. Recently, this algorithm has been extended and generalized to software systems (e.g. Hoder and Bjørner, 2012; Komuravelli et al., 2014; Bjørner and Gurfinkel, 2015; Karbyshev et al., 2015; Cimatti et al., 2014).
Successful SATbased inference algorithms are typically tricky and employ many clever heuristics. This is in line with the inherent asymptotic complexity of invariant inference, which is hard even with access to a SAT solver
(Lahiri and Qadeer, 2009). However, the practical success of inference algorithms calls for a more refined complexity analysis, with the objective of understanding the principles on which these algorithms are based. This paper studies the asymptotic complexity of SATbased invariant inference through the decision problem of polynomial length inference in the blackbox Hoarequery model, as we now explain.Inference of polynomiallength CNF
Naturally, inference algorithms succeed when the invariant they infer is not too long. Therefore, this paper considers the complexity of inferring invariants of polynomial length. We follow the recent trend in invariant inference, advocated in (McMillan, 2003; Bradley, 2011), to search for invariants in rich syntactical forms, beyond those usually considered in templatebased invariant inference (e.g. Jeannet et al., 2014; Colón et al., 2003; Sankaranarayanan et al., 2004; Srivastava and Gulwani, 2009; Srivastava et al., 2013; Alur et al., 2015), with the motivation of achieving generality of the verification method and potentially improving the success rate. We thus study the inference of invariants expressed in Conjunctive Normal Form (CNF) of polynomial length. Interestingly, our results also apply to inferring invariants in Disjunctive Normal Form.
The Hoarequery model
Our study of SATbased methods focuses on an algorithmic model called the Hoarequery model. The idea is that the inference algorithm is not given direct access to the program, but performs queries on it. In the Hoarequery model, algorithms repeatedly choose and query for the validity of Hoare triples , where is the transition relation denoting one step of the system, inaccessible to the algorithm but via such Hoare queries. The check itself is implemented by an oracle, which in practice is a SAT solver. This model is general enough to capture algorithms such as PDR and its variants, and leaves room for other interesting design choices, but does not capture whitebox approaches such as abstract interpretation (Cousot and Cousot, 1977). The advantage of this model for a theoretical study is that it enables an informationbased analysis, which (i) sidesteps open computational complexity questions, and therefore results in unconditional lower bounds on the computational complexity of SATbased algorithms captured by the model, and (ii) grants meaning to questions about generalization from partial information we discuss later.
Results
This research addresses two main questions related to the core ideas behind PDR, and theoretically analyzes them in the context of the Hoarequery model:

These algorithms revolve around the question of generalization: from observing concrete states (to be excluded from the invariant), the algorithm seeks to produce assertions that hold for all
reachable states. The different heuristics in this context are largely understood as clever ways of performing this generalization. The situation is similar in interpolationbased algorithms, only that generalization is performed from bounded safety proofs rather than states. How should generalization be performed to achieve efficient invariant inference?

A key aspect of PDR is the form of SAT checks it uses, as part of relative inductiveness checks, of Hoare triples in which in general .^{1}^{1}1For the PDRsavvy: is typically a candidate clause, and is derived from the previous frame. Repeated queries of this form are potentially richer than presenting a series of candidate invariants, where the check is . Is there a benefit in using relative inductiveness beyond inductiveness checks?
We analyze these questions in the foundational case of Boolean programs, which is applicable to infinitestate systems through predicate abstraction (Graf and Saïdi, 1997; Flanagan and Qadeer, 2002; Lahiri and Qadeer, 2009), and also a core part of other invariant inference techniques for infinitestate (e.g. Hoder and Bjørner, 2012; Komuravelli et al., 2014; Karbyshev et al., 2017).
In Section 6, we answer question 1 with an impossibility result, by showing that no choice of generalization can lead to an inference algorithm using only a polynomial number of Hoare queries. Our lower bound is informationtheoretic, and holds even with unlimited computational power, showing that the problem of generalization is chiefly a question of information gathering.
In Section 7, we answer question 2 in the affirmative, by showing an exponential gap between algorithms utilizing rich checks and algorithms that perform only inductiveness checks
. Namely, we construct a class of programs for which a simple version of PDR can infer invariants efficiently, but every algorithm learning solely from counterexamples to the inductiveness of candidates requires an exponential number of queries. This result shows, for the first time theoretically, the significance of relative inductiveness checks as the foundation of PDR’s mechanisms, in comparison to a machine learning approach pioneered in the ICE model
(Garg et al., 2014, 2016) that infers invariants based on inductiveness checks only.Our results also clarify the relationship between the problem of invariant inference and the classical theory of exact concept learning with queries (Angluin, 1987). In particular, our results imply that learning from counterexamples to induction is harder than learning from positive & negative examples (Section 8), providing a formal justification to the existing intuition (Garg et al., 2014). This demonstrates that the convergence rate of learning in CounterexampleGuided Inductive Synthesis (e.g. SolarLezama et al., 2006; Jha et al., 2010; Jha and Seshia, 2017) depends on the form of examples. We also establish impossibility results for directly applying algorithms from concept learning to invariant inference.
The contributions of the paper are summarized as follows:

We introduce the Hoarequery model, a blackbox model of invariant inference capable of modeling PDR (Section 5), and study the query complexity of polynomiallength invariant inference in this model.

We show that in general an algorithm in this model requires an exponential number of queries to solve polynomiallength inference, even though Hoare queries are rich and versatile (Section 6).

We also extend this result to a model capturing interpolationbased algorithms (Section 6.2).

We prove that exact learning from counterexamples to induction is harder than exact learning from positive & negative examples, and derive impossibility results for translating some exact concept learning algorithms to the setting of invariant inference (Section 8).
2. Overview
Coming up with inductive invariants is one of the most challenging tasks of formal verification—it is often referred to as the “Eureka!” step. This paper studies the asymptotic complexity of automatically inferring CNF invariants of polynomial length, a problem we call polynomiallength inductive invariant inference, in a SATbased blackbox model.
Consider the dilemmas Abby faces when she attempts to develop an algorithm for this problem from first principles. Abby is excited about the popularity of SATbased inference algorithms. Many such algorithms operate by repeatedly performing checks of Hoare triples of the form , where are a precondition and postcondition (resp.) chosen by the algorithm in each query and is the given transition relation (loop body). A SAT solver implements the check. We call such checks Hoare queries, and focus in this paper on blackbox inference algorithms in the Hoarequery model: algorithms that access the transition relation solely through Hoare queries.
Fig. 1 displays one example program that Abby is interested in inferring an inductive invariant for. In this program, a number , represented by bits, is initialized to zero, and at each iteration incremented by an even number that is decided by the input variables (all computations are mod ). The representation of the number using the bits is determined by another set of bits , which are all immutable, and only one of them is true: if , the number is represented by , if the leastsignificant bit (lsb) shifts and the representation is and so on. The safety property is that is never equal to the number with all bits 1. Intuitively, this holds because the number is always even. An inductive invariant states this fact, taking into account the differing representations, by stating that the lsb (as chosen by ) is always 0: Of course, Abby aims to verify many systems, of which Fig. 1 is but one example.
2.1. Example: BackwardReachability with Generalization
How should Abby’s algorithm go about finding inductive invariants? One known strategy is that of backward reachability, in which the invariant is strengthened to exclude states from which bad states may be reachable.^{2}^{2}2Our results are not specific to backwardreachability algorithms; we use them here for motivation and illustration. Alg. 1 is an algorithmic backwardreachability scheme: it repeatedly checks for the existence of a counterexample to induction (a transition of from to ), and strengthens the invariant to exclude the prestate using the formula Block returns.
Alg. 1 depends on the choice of Block. The most basic approach is of Alg. 2, which excludes exactly the prestate, by conjoining to the invariant the negation of the cube of (the cube is the conjunction of all literals that hold in the state; the only state that satisfies is itself, and thus the only one to be excluded from in this approach). For example, when Alg. 1 needs to block the state (this state reaches the bad state ), Alg. 2 does so by conjoining to the invariant the negation of , and this is a formula that other states do not satisfy.
Alas, Alg. 1 with blocking by Alg. 2 is not efficient. In essence it operates by enumerating and excluding the states backwardreachable from bad. The number of such states is potentially exponential, making Alg. 2 unsatisfactory. For instance, the example of Fig. 1 requires the exclusion of all states in which
is odd for every choice of lsb, a number of states exponential in
. The algorithm would thus require an exponential number of queries to arrive at a (CNF) inductive invariant, even though a CNF invariant with only clauses exists (as above).Efficient inference hence requires Abby to exclude more than a single state at each time, namely, to generalize from a counterexample—as real algorithms do. What generalization strategy could Abby choose that would lead to efficient invariant inference?
2.2. All Generalizations are Wrong
One simple generalization strategy Abby considers appears in Alg. 3, based on the standard ideas in IC3/PDR (Bradley, 2011; Eén et al., 2011) and subsequent developments (e.g. Hoder and Bjørner, 2012; Komuravelli et al., 2014). It starts with the cube (as Alg. 2) and attempts to drop literals, resulting in a smaller conjunction, which many states satisfy; all these states are excluded from the candidate in creftype 6 of Alg. 1. Hence with this generalization Alg. 1 can exclude many states in each iteration, overcoming the problem with the naive algorithm above. Alg. 3 chooses to drop a literal from the conjunction if no state reachable in at most one step from Init satisfies the conjunction even when that literal is omitted (creftype 4 of Alg. 3); we refer to this algorithm as PDR1, since it resembles PDR with a single frame.
For example, when in the example of Fig. 1 the algorithm attempts to block the state with , Alg. 3 minimizes the cube to , because no state reachable in at most one step satisfies , but this is no longer true when another literal is omitted. Conjoining the invariant with (in creftype 6 of Alg. 1) produces a clause of the invariant, . In fact, our results show that PDR1 finds the aforementioned invariant in queries.
Yet there is a risk in overgeneralization, that is, of dropping too many literals and excluding too many states. In Alg. 1, generalization must not return a formula that some reachable states satisfy, or the candidate would exclude reachable states and would not be an inductive invariant. Alg. 3 chooses to take the strongest conjunction that does not exclude any state reachable in at most one step; it is of course possible (and plausible) that some states are reachable in two steps but not in one. Alg. 1 with the generalization in Alg. 3 might fail in such cases.
The necessity of generalization, on the one hand, and the problem of overgeneralization on the other leads in practice to complex heuristic techniques. Instead of simple backwardreachability with generalization per Alg. 1, PDR never commits to a particular generalization (Eén et al., 2011) through a sequence of frames, which are (in some sense) a sequence of candidate invariants. The clauses resulting from generalization are used to strengthen frames according to a bounded reachability analysis; Alg. 3 corresponds to generalization in the first frame.
Overall, the study of backwardreachability and the PDR1 generalization leaves us with the question: Is there a choice of generalization that can be used—in any way—to achieve an efficient invariant inference algorithm?
In a noninteresting way, the answer is yes, there is a “good” way to generalize: Use Alg. 1, with the following generalization strategy: Upon blocking a prestate , compute an inductive invariant of polynomial length, and return the clause of the invariant that excludes ,^{3}^{3}3Such a clause exists because is backwardreachable from bad states, and thus excluded from the invariant. and this terminates in a polynomial number of steps.
Such generalization is clearly unattainable. It requires (1) perfect information of the transition system, and (2) solving a computationally hard problem, since we show that polynomiallength inference is hard (Thm. 4.2). What happens when generalization is computationally unbounded (an arbitrary function), but operates based on partial information of the transition system? Is there a generalization from partial information, be it computationally intractable, that facilitates efficient inference? If such a generalization exists we may wish to view invariant inference heuristics as approximating it in a computationally efficient way.
Similar questions arise in interpolationbased algorithms, only that generalization is performed not from a concrete state, but from a bounded unreachability proof. Still it is challenging to generalize enough to make progress but not too much as to exclude reachable states (or include states from which bad is reachable).
2.2.1. Our results
Our first main result in this paper is that in general, there does not exist a generalization scheme from partial information leading to efficient inference based on Hoare queries. Technically, we prove that even a computationally unrestricted generalization from information gathered from Hoare queries requires an exponential number of queries. This result applies to any generalization strategy and any algorithm using it that can be modeled using Hoare queries, including Alg. 1 as well as more complex algorithms such as PDR. We also extend this lower bound to a model capturing interpolationbased algorithms (Thm. 6.6).
These results are surprising because apriori it would seem possible, using unrestricted computational power, to devise queries that repeatedly halve the search space, yielding an invariant with a polynomial number of queries (the number of candidates is only exponential because we are interested in invariants up to polynomial length). We show that this is impossible to achieve using Hoare queries.
2.3. Inference Using Rich Queries
So far we have established strong impossibility results for invariant inference based on Hoare queries in the general case, even with computationally unrestricted generalization. We now turn to shed some light on the techniques that inference algorithms such as PDR employ in practice. One of the fundamental principles of PDR is the incremental construction of invariants relying on rich Hoare queries. PDR1 demonstrates a simplified realization of this principle. When PDR1 considers a clause to strengthen the invariant, it checks the reachability of that individual clause from Init, rather than the invariant as a whole. This is the Hoare query in creftype 4 of Alg. 3, in which, crucially, the precondition is different from the postcondition. The fullfledged PDR is similar in this regard, strengthening a frame according to reachability from the previous frame via relative induction checks (Bradley, 2011).
The algorithm in Alg. 2 is fundamentally different, and uses only inductiveness queries , a specific form of Hoare queries where the precondition and postcondition are the same. Algorithms performing only inductiveness checks can in fact be very sophisticated, traversing the domain of candidates in clever ways. This approach was formulated in the ICE learning framework for learning inductive invariants (Garg et al., 2014, 2016) (later extended to general ConstrainedHorn Clauses (Ezudheen et al., 2018)), in which algorithms present new candidates based on positive, negative, and implication examples returned by a “teacher” in response to incorrect candidate invariants.^{4}^{4}4Our formulation focuses on implication examples—counterexamples to inductiveness queries—and strengthens the algorithm with full information about the set of initial and bad states instead of positive and negative examples (resp.). The main point is that such algorithms do not perform queries other than inductiveness, and choose the next candidate invariant based solely on the counterexamples to induction showing the previous candidates were unsuitable.
The contrast between the two approaches raises the question: Is there a benefit to invariant inference in Hoare queries richer than inductiveness? For instance, to model PDR in the ICE framework, Vizel et al. (2017) extended the framework with relative inductiveness checks, but the question whether such an extension is necessary remained open.
2.3.1. Our results
Our second significant result in this paper is showing an exponential gap between the general Hoarequery model and the more specific inductivenessquery model. To this end, we construct a class of transition systems, including the example of Fig. 1, for which (1) PDR1, which is a Hoarequery algorithm, infers an invariant in a polynomial number of queries, but (2) everyinductivenessquery algorithm requires an exponential number of queries, that is, an exponential number of candidates before it finds a correct inductive invariant. This demonstrates that analyzing the reachability of clauses separately can offer an exponential advantage in certain cases. This also proves that PDR cannot be cast in the ICE framework, and that the extension by Vizel et al. (2017) is necessary and strictly increases the power of inference with a polynomial number of queries. To the best of our knowledge, this is not only the first lower bound on ICE learning demonstrating such an exponential gap (also see the discussion in Section 9), but also the first polynomial upper bound on PDR for a class of systems.
We show this separation on a class of systems constructed using a technical notion of maximal systems for monotone invariants. These are systems for which there exists a monotone invariant (namely, an invariant propositional variables appear only negatively) with a linear number of clauses, and the transition relation includes all transitions allowed by this invariant. The system in Fig. 1 is an example of a maximal system: it allows every transition between states satisfying the invariant (namely, between all even ’s with the same representation), and also every transition between states violating the invariant (namely, between all odd ’s with the same representation).^{5}^{5}5Transitions violating the axiom or modifying it are excluded in this modeling. The success of PDR1 relies on the small diameter (every reachable state is reachable in one step) and the fact that the invariant is monotone. However, we show that for inductivenessquery algorithms this class is as hard as the class of all programs admitting monotone invariants, whose hardness is established from the results of Section 2.2.1. For example, from the perspective of inductivenessquery algorithms, the example of Fig. 1, which is a maximal program as explained above, is as hard as any system that admits its invariant (and also respects the axiom and leaves unchanged). This is because an inductivenessquery algorithm can only benefit from having fewer transitions and hence fewer counterexamples to induction, whereas maximal programs include as many transitions as possible. If an inductiveness query algorithm is to infer an invariant for the example of Fig. 1, it must also be able to infer an invariant for all systems whose transitions are a subset of the transitions of this example. This includes systems with an exponential diameter, as well as systems admitting other invariants, potentially exponentially long. This program illustrates our lower bound construction, which takes all maximal programs for monotoneCNF invariants.
In our lower bound we follow the existing literature on the analysis of inductivenessquery algorithms, which focuses on the worstcase notion w.r.t. potential examples (strong convergence in Garg et al. (2014)). An interesting direction is to analyze inductivenessquery algorithms that exercise some control over the choice of counterexamples to induction, or under probabilistic assumptions on the distribution of examples.
2.4. A Different Perspective: Exact Learning of Invariants with Hoare Queries
This paper can be viewed as developing a theory of exact learning of inductive invariants with Hoare queries, akin to the classical theory of concept learning with queries (Angluin, 1987). The results outlined above are consequences of natural questions about this model: The impossibility of generalization from partial information (Section 2.2.1) stems from an exponential lower bound on the Hoarequery model. The power of rich Hoare queries (Section 2.3.1) is demonstrated by an exponential separation between the Hoare and inductivenessquery models, in the spirit of the gap between concept learning using both equivalence and membership queries and concept learning using equivalence queries alone (Angluin, 1990).
The similarity between invariant inference (and synthesis in general) and exact concept learning has been observed before (e.g. Jha et al., 2010; Garg et al., 2014; Jha and Seshia, 2017; Alur et al., 2015; Bshouty et al., 2017). Our work highlights some interesting differences and connections between invariant learning with Hoare, and concept learning with equivalence and membership queries. This comparison yields (im)possibility results for translating algorithms from concept learning with queries to invariant inference with queries. Another outcome is the third significant result of this paper: a proof that learning from counterexamples to induction is inherently harder than learning from examples labeled as positive or negative, formally corroborating the intuition advocated by Garg et al. (2014). More broadly, the complexity difference between learning from labeled examples and learning from counterexamples to induction demonstrates that the convergence rate of learning in CounterexampleGuided Inductive Synthesis (e.g. Jha et al., 2010; Jha and Seshia, 2017) depends on the form of examples. The proof of this result builds on the lower bounds discussed earlier, and is discussed in Section 8.^{6}^{6}6 It may also be interesting to note that one potential difference between classical learning and invariant inference, mentioned by Löding et al. (2016), does not seem to manifest in the results discussed in Section 2.2.1: the transition systems in the lower bound for inductiveness queries in Corollary 7.11 have a unique inductive invariant, and still the problem is hard.
3. Background
3.1. States, Transitions Systems, and Inductive Invariants
In this paper we consider safety problems defined via formulas in propositional logic. Given a propositional vocabulary that consists of a finite set of Boolean variables, we denote by the set of well formed propositional formulas defined over . A state is a valuation to . For a state , the cube of , denoted , is the conjunction of all literals that hold in . A transition system is a triple such that define the initial states and the bad states, respectively, and defines the transition relation, where is a copy of the vocabulary used to describe the poststate of a transition. A class of transition systems, denoted , is a set of transition systems. A transition system TS is safe if all the states that are reachable from the initial states via steps of satisfy . An inductive invariant for TS is a formula such that , , and , where denotes the result of substituting each for in , and denotes the validity of the formula . In the context of propositional logic, a transition system is safe if and only if it has an inductive invariant. When is not inductive, a counterexample to induction is a pair of states such that (where the valuation to is taken from ).
The classes , and
is the set of propositional formulas in Conjunctive Normal Form (CNF) with at most clauses (disjunction of literals). is likewise for Disjunctive Normal Form (DNF), where is the maximal number of cubes (conjunctions of literals). is the subset of in which all literals are negative.
3.2. Invariant Inference Algorithms
In this section we briefly provide background on inference algorithms that motivate our theoretical development in this paper. The main results of the paper do not depend on familiarity with these algorithms or their details; this (necessarily incomprehensive) “inference landscape” is presented here for context and motivation for defining the Hoarequery model (Section 5), studying its complexity and the feasibility of generalization (Section 6), and analyzing the power of Hoare queries compared to inductiveness queries (Section 7). We allude to specific algorithms in motivating each of these sections.
Ic3/pdr
IC3/PDR maintains a sequence of formulas , called frames, each of which can be understood as a candidate inductive invariant. The sequence is gradually modified and extended throughout the algorithm’s run. It is maintained as an approximate reachability sequence, meaning that (1) , (2) , (3) , and (4) . These properties ensure that overapproximates the set of states reachable in steps, and that the approximations contain no bad states. (We emphasize that does not imply that a bad state is unreachable in any number of states.) The algorithm terminates when one of the frames implies its preceding frame (), in which case it constitutes an inductive invariant, or when a counterexample trace is found. In iteration , a new frame is added to the sequence. One way of doing so is by initializing to true, and strengthening it until it excludes all bad states. Strengthening is done by blocking bad states: given a bad state , the algorithm strengthens to exclude all ’s prestates—states that satisfy —one by one (thereby demonstrating that is unreachable in steps). Blocking a prestate from frame is performed by a recursive call to block its own prestates from frame , and so on. If this process reaches a state from Init, the sequence of states from the recursive calls constitutes a trace reaching Bad from Init, which is a counterexample to safety. Alternatively, when a state is successfully found to be unreachable from in one step, i.e., is unsatisfiable, frame is strengthened to reflect this fact. Aiming for efficient convergence (see Section 2.1), PDR chooses to generalize, and exclude more states. A basic form of generalization is performed by dropping literals from as long as the result is still unreachable from , i.e., is still unsatisfiable. This is very similar to PDR1 above (Section 2.2), where was always . Often inductive generalization is used, dropping literals as long as , reading that is inductive relative to , which can drop more literals than basic generalization. A core optimization of PDR is pushing, in which a frame is “opportunistically” strengthened with a clause from , if is already sufficiently strong to show that is unreachable in .
For a more complete presentation of PDR and its variants as a set of abstract rules that may be applied nondeterministically see e.g. Hoder and Bjørner (2012); Gurfinkel and Ivrii (2015). The key point from the perspective of this paper is that the algorithm and its variants access the transition relation in a very specific way, checking whether some is unreachable in one step of from the set of states satisfying a formula (or those satisfying ), and obtains a counterexample when it is reachable (see also Vizel et al. (2017)). Crucially, other operations (e.g., maintaining the frames, checking whether , etc.) do not use . We will return to this point when discussing the Hoarequery model, which can capture IC3/PDR (Section 5).
Ice
The ICE framework (Garg et al., 2014, 2016) (later extended to general ConstrainedHorn Clauses (Ezudheen et al., 2018)), is a learning framework for inferring invariants from positive, negative and implication counterexamples. We now review the framework using the original terminology and notation; later in the paper we will use a related formulation that emphasizes the choice of candidates (in Section 7.1).
In ICE learning, the teacher holds an unknown target , where are sets of examples. The learner’s goal is to find a hypothesis s.t. , and for each , . The natural way to cast inference in this framework is, given a transition system and a set of candidate invariants , to take as the set of program states, a set of reachable states including Init, a set of states including Bad from which a safety violation is reachable, the set of transitions of , and . Iterative ICE learning operates in rounds. In each round, the learner is provided with a sample— s.t. —and outputs an hypothesis . The teacher returns that the hypothesis is correct, or extends the sample with an example showing that is incorrect. The importance of implication counterexamples is that they allow implementing a teacher using a SAT/SMT solver without “guessing” what a counterexample to induction indicates (Garg et al., 2014; Löding et al., 2016). Examples of ICE learning algorithms include Houdini (Flanagan and Leino, 2001) and symbolic abstraction (Reps et al., 2004; Thakur et al., 2015), as well as designated algorithms (Garg et al., 2014, 2016). Theoretically, the analysis of Garg et al. (2014) focuses on strong convergence of the learner, namely, that the learner can always reach a correct concept, no matter how the teacher chooses to extend samples between rounds. In this work, we will be interested in the number of rounds the learner performs. We will say that the learner is stronglyconvergent with roundcomplexity if for every ICE teacher, the learner finds a correct hypothesis in at most rounds, provided that one exists. We extend this definition to a class of target descriptions in the natural way.
Interpolation
The idea of interpolationbased algorithms, first introduced by McMillan (2003), is to generalize proofs of bounded unreachability into elements of a proof of unbounded reachability, utilizing Craig interpolation. Briefly, this works as follows: encode a bounded reachability from a set of states in steps, and use a SAT solver to find that this cannot reach Bad. When efficient interpolation is supported in the logic and solver, the SAT solver can produce an interpolant : a formula representing a set of states that (i) overapproximates the set of states reachable from in steps, and still (ii) cannot reach Bad in steps (any choice is possible). Thus overapproximates concrete reachability from without reaching a bad state, although both these facts are known in only a bounded number of steps. The hope is that would be a useful generalization to include as part of the invariant. The original algorithm (McMillan, 2003) sets some as the current unrolling bound, starts with , obtains an interpolant with , sets and continues in this fashion, until an inductive invariant is found, or Bad becomes reachable in steps from , in which case is incremented and the algorithm is restarted. The use of interpolation and generalization from bounded unreachability has been used in many works since (e.g. Vizel and Grumberg, 2009; McMillan, 2006; Jhala and McMillan, 2007; Henzinger et al., 2004; Vizel et al., 2013). Combining ideas from interpolation and PDR has also been studied (e.g. Vizel and Gurfinkel, 2014). The important point for this paper is that many interpolationbased algorithms only access the transition relation when checking bounded reachability (from some set of states to some set of states ), and extracting interpolants when the result is unreachable. We will return to this point when discussing the interpolationquery model, which aims to capture interpolationbased algorithms (Section 6.2).
4. PolynomialLength Invariant Inference
In this section we formally define the problem of polynomiallength invariant inference for CNF formulas, which is the focus of this paper. We then relate the problem to the problem of inferring DNF formulas with polynomially many cubes via duality (see Appendix A), and focus on the case of CNF in the rest of the paper.
Our object of study is the problem of polynomiallength inference:
Definition 4.1 (PolynomialLength Inductive Invariant Inference).
The polynomiallength inductive invariant inference problem (invariant inference for short) for a class of transition systems and a polynomial is the problem: Given a transition system over , decide whether there exists an inductive invariant for TS, where .
Notation.
In the sequel, when considering the polynomiallength inductive invariant inference problem of a transition system , we denote by the vocabulary of and . Further, we denote .
Complexity
The complexity of polynomiallength inference is measured in . Note that the invariants are required to be polynomial in .
is a rich class of invariants. Inference in more restricted classes can be solved efficiently. For example, when only conjunctive candidate invariants are considered, and is the set of all propositional transition systems, the problem can be decided in a polynomial number of SAT queries through the Houdini algorithm (Flanagan and Leino, 2001; Lahiri and Qadeer, 2009). Similar results hold also for CNF formulas with a constant number of literals per clause (by defining a new predicate for each of the polynomiallymany possible clauses and applying Houdini), and for CNF formulas with a constant number of clauses (by translating them to DNF formulas with a constant number of literals per cube and applying the dual procedure). However, a restricted class of invariants may miss invariants for some programs and reduces the generality of the verification procedure. Hence in this paper we are interested in the richer class of polynomiallylong CNF invariants. In this case the problem is no longer tractable even with a SAT solver:
Theorem 4.2 ().
Let be the set of all propositional transition systems. Then polynomiallength inference for is complete, where is the second level of the polynomialtime hierarchy.
We defer the proof to Section 6.1.1.
We note that polynomiallength inference can be encoded as specific instances of templatebased inference; the hardness proof of Lahiri and Qadeer (2009) uses more general templates and therefore does not directly imply the hardness of polynomiallength inference. Lower bounds on polynomiallength inference entail lower bounds for templatebased inference.
Remark 4.1 ().
In the above formulation, an efficient procedure for deciding safety does not imply polynomiallength inference is tractable, since the program may be safe, but all inductive invariants may be too long. To overcome this technical quirk, we can consider a promise problem (Goldreich, 2006) variant of polynomiallength inference:
Given a transition system ,

(Completeness) If TS has an inductive invariant , the algorithm must return yes.

(Soundness) If TS is not safe the algorithm must return no.
Other cases, including the case of safety with an invariant outside , are not constrained. An algorithm deciding safety thus solves also this problem. All the results of this paper apply both to the standard version above and the promise problem: upper bounds on the standard version trivially imply upper bounds on the promise problem, and in our lower bounds we use transition systems that are either (i) safe and have an invariant in , or (ii) unsafe.
5. Invariant Inference with Queries and the Hoare Query Model
In this paper we study algorithms for polynomiallength inference through blackbox models of inference with queries. In this setting, the algorithm accesses the transition relation through (rich) queries, but cannot read the transition relation directly. Our main model is of Hoarequery algorithms, which query the validity of a postcondition from a precondition in one step of the system. Hoarequery algorithms faithfully capture a large class of SATbased invariant inference algorithms, including PDR and related methods.
A blackbox model of inference algorithms facilitates an analysis of the information of the transition relation the algorithm acquires. The advantage is that such an informationbased analysis sidesteps open computational complexity questions, and therefore results in unconditional lower bounds on the computational complexity of SATbased algorithms captured by the model. Such an informationbased analysis is also necessary for questions involving unbounded computational power and restricted information, in the context of computationallyunrestricted boundedreachability generalization (see Section 6.3).
In this section we define the basic notions of queries and querybased inference algorithms. We also define the primary query model we study in the paper: the Hoarequery model. In the subsequent sections we introduce and study additional query models: the interpolationquery model (Section 6.2), and the inductivenessquery model (Section 7.1).
Inference with queries
We model queries of the transition relation in the following way: A query oracle is an oracle that accepts a transition relation , as well as additional inputs, and returns some output. The additional inputs and the output, together also called the interface of the oracle, depend on the query oracle under consideration. A family of query oracles is a set of query oracles with the same interface. We consider several different query oracles, representing different ways of obtaining information about the transition relation.
Definition 5.1 (Inference algorithm in the query model).
An inference algorithm from queries, denoted , is defined w.r.t. a query oracle and is given:

access to the query oracle ,

the set of initial states (Init) and bad states (Bad);

the transition relation , encapsulated—hence the notation —meaning that the algorithm cannot access (not even read it) except for extracting its vocabulary; can only be passed as an argument to the query oracle .
solves the problem of polynomiallength invariant inference for .
The Hoarequery model
Our main object of study in this paper is the Hoarequery model of invariant inference algorithms. It captures SATbased invariant inference algorithms querying the behavior of a single step of the transition relation at a time.
Definition 5.2 (HoareQuery Model).
For a transition relation and input formulas , the Hoarequery oracle, , returns false if ; otherwise it returns true.
An algorithm in the Hoarequery model, also called a Hoarequery algorithm, is an inference from queries algorithm expecting the Hoare query oracle.
Intuitively, a Hoarequery algorithm gains access to the transition relation, , exclusively by repeatedly choosing , and calling .
If we are using a SAT solver to compute the Hoarequery, , then when the answer is false, the SAT solver will also produce a counterexample pair of states such that .
We observe that using binary search, a Hoarequery algorithm can do the same:
Lemma 5.3 ().
Whenever , a Hoarequery algorithm can find such that using Hoare queries.
Proof.
For each , if , conjoin it to , else to , and check whether is still false. If it is, continue to ; otherwise flip and continue to . ∎
Example: PDR as a Hoarequery algorithm
The Hoarequery model captures the prominent PDR algorithm, facilitating its theoretical analysis. As discussed in Section 3.2, PDR accesses the transition relation via checks of unreachability in one step and counterexamples to those checks. These operations are captured in Hoare query model by checking or (for the algorithm’s choice of ), and obtaining a counterexample using a polynomial number of Hoare queries, if one exists (Lemma 5.3). Furthemore, the Hoarequery model is general enough to express a broad range of PDR variants that differ in the way they use such checks but still access the transition relation only through such queries.
The Hoarequery model is not specific to PDR. It also captures algorithms in the ICE learning model (Garg et al., 2014), as we discuss in Section 7.1, and as result can model algorithms captured by the ICE model (see Section 3.2). In Section 7.2 we show that the Hoarequery model is in fact strictly more powerful than the ICE model.
Remark 5.1 ().
Previous blackbox models for invariant inference encapsulated access also to (Garg et al., 2014). In our model we encapsulate only access to , since (1) it is technically simpler, (2) a simple transformation can make uniform across all programs, embedding the differences in the transition relation; indeed, our constructions of classes of transition systems in this paper are such that are the same in all transition systems that share a vocabulary, hence may be inferred from the vocabulary. (Unrestricted access to is stronger, thus lower bounds on our models apply also to models restricting access.)
Complexity.
Focusing on information, we do not impose computational restrictions on the algorithms, and only count the number of queries the algorithm performs to reveal information of the transition relation. In particular, when establishing lower bounds on the query complexity, we even consider algorithms that may compute noncomputable functions. However, whenever we construct algorithms demonstrating upper bounds on query complexity, these algorithms in fact have polynomial time complexity, and we note this when relevant.
Given a query oracle and an inference algorithm that uses it, we analyze the number of queries the algorithm performs as a function of , in a worstcase model w.r.t. to possible transition systems over in the class of interest.
The definition is slightly more complicated by considering, as we do later in the paper, querymodels in which more than one oracle exists, i.e., an algorithm may use any oracle from a family of query oracles. In this case, we analyze the query complexity of an algorithm in a worstcase model w.r.t. the possible query oracles in the family as well.
Formally, the query complexity is defined as follows:
Definition 5.4 (Query Complexity).
For a class of transitions systems , the query complexity of (a possibly computationally unrestricted) w.r.t. a query oracle family is defined as
(1) 
where is the number of times the algorithm accesses given this oracle and the input. (These numbers might be infinite.)
The query complexity in the Hoarequery model is .
Remark 5.2 ().
In our definition, query complexity is a function of the size of the vocabulary , but not of the size of the representation of the transition relation . This reflects the fact that an algorithm in the blackbox model does not access directly. In Appendix B we discuss the complexity w.r.t. as well. The drawback of such a complexity measure is that learning itself becomes feasible, undermining the blackbox model. Efficiently learning is possible when using unlimited computational power and exponentiallylong queries. However, whether the same holds when using unlimited computational power with only polynomiallylong queries is related to open problems in classical concept learning.
6. The Information Complexity of HoareQuery Algorithms
In this section we prove an informationbased lower bound on Hoarequery invariant inference algorithms, and also extend the results to algorithms using interpolation, another SATbased operation. We then apply these results to study the role of information in generalization as part of inference algorithms.
6.1. Information Lower Bound for HoareQuery Inference
We show that a Hoarequery inference algorithm requires Hoarequeries in the worst case to decide whether a CNF invariant of length polynomial in exists. (Recall that is a shorthand for , the size of the vocabulary of the input transition system.) This result applies even when allowing the choice of queries to be inefficient, and when allowing the queries to use exponentiallylong formulas. It provides a lower bound on the time complexity of actual algorithms, such as PDR, that are captured by the model. Formally:
Theorem 6.1 ().
Every Hoarequery inference algorithm deciding polynomiallength inference for the class of all propositional transition systems has query complexity of .
The rest of this section proves a strengthening of this theorem, for a specific class of transition systems (which we construct next), for any class of invariants that includes monotone CNF, and for computationallyunrestricted algorithms:
Theorem 6.2 ().
Every Hoarequery inference algorithm , even computationallyunrestricted, deciding invariant inference for the class of transition systems (Section 6.1.1) and for any class of target invariants s.t. ^{7}^{7}7Here we extend the definition of polynomiallength invariant inference to instead of . has query complexity of .
(That classes containing are already hard becomes important in Section 7.)
6.1.1. A Hard Class of Transition Systems
In this section we construct a , a hard class of transition systems, on which we prove hardness results.
The problem
The construction of follows the complete problem of from classical computational complexity theory. In this problem, the input is a quantified Boolean formula where is a Boolean (quantifierfree) formula, and the problem of is to decide whether the quantified formula is true, namely, there exists a Boolean assignment to s.t. is true for every Boolean assignment to .
The class .
For each , we define . Finally, .
Let . For each formula , where , are variables and is a quantifierfree formula over the variables , we define a transition system . Intuitively, it iterates through lexicographically, and for each it iterates lexicographically through and checks if all assignments to satisfy . If no such is found, this is an error. More formally,

.

.

.

: evaluate , and perform the following changes (at a single step): If the result is false, set to true. If and is still false, set to false. If in the prestate , increment lexicographically, reset to false, and set ; otherwise increment lexicographically. If in the prestate , set to . (Intuitively, is false as long as no falsifying assignment to has been encountered for the current , is true as long as we have not yet encountered a for which there is no falsifying assignment.)
We denote the resulting class of transition systems .
The following lemma relates the problem for to the inference problem of :
Lemma 6.3 ().
Let . Then is safe iff it has an inductive invariant in iff the formula is true.
Proof.
There are two cases:

If is true, let be the first valuation for that realizes the existential quantifiers. Then the following is an inductive invariant for :
(2) where the lexicographic constraint is expressed by the following recursive definition on :
and (or true if ).
: Note that can be written in CNF with at most clauses: in the first case a literal is added to each clause, and in the second another clause is added. Thus can be written in CNF with at most clauses. Further, the literals of appear only negatively in , and hence also in . The other literals () also appear only negatively in . Hence, is monotone.
is indeed an inductive invariant: initiation and safety are straightforward. For consecution, consider a valuation to in a prestate satisfying the invariant. (We abuse notation and refer to the valuation by .) There are three cases:

If , then (i) is retained by a step, (ii) holds after a step, (iii) still holds unless the transition is from the last evaluation for to , in which case is turned to false.

If , the invariant guarantees that in the prestate is false, and thus remains false after a step. also remains false and thus the rest of the invariant also holds in the poststate.

If , the invariant guarantees that in the prestate either is false or is false. If is false the same reasoning of the previous case applies. Otherwise, we have that is false. By the definition of all valuations for results in , so remains false after a step, and once we finish iterating through we set to false immediately.
The claim follows.


If is not true, then is not safe (and thus does not have an inductive invariant of any length). This is because for every valuation of a violating is found, turning to true, and never turns to false, so after iterating through all possible ’s will become true.
∎
Before we turn to prove Thm. 6.2 and establish a lower bound on the query complexity in the Hoare model, we note that this construction also yields the computational hardness mentioned in Section 4:
Proof of Thm. 4.2.
The upper bound is straightforward: guess an invariant in and check it. For the lower bound, use the reduction outlined above: given , construct . Note that the vocabulary size, , is , and the invariant, when exists, is of length at most .^{8}^{8}8For an arbitrary polynomial , e.g., with , enlarge , e.g., by adding to Init initialization of fresh variables that are not used elsewhere, to ensure existence of an invariant of length . The reduction is polynomial as the construction of (and ) from is polynomial in and : note that lexicographic incrementation can be performed with a propositional formula of polynomial size. ∎
6.1.2. Lower Bound’s Proof
We now turn to prove Thm. 6.2. Given an algorithm with polynomial query complexity, the proof constructs two transition system: one that has a polynomiallength invariant and one that does not, and yet all the queries the algorithm performs do not distinguish between them. The construction uses the path the algorithm takes when all Hoare queries return false as much as possible. Intuitively, such responses are less informative and rule out less transition relations, because they merely indicate the existence of a single counterexample to a Hoare triple, opposed to the result true which indicates that all transitions satisfy a property.
Proof of Thm. 6.2.
Let be a computationally unbounded Hoarequery algorithm. We show that the number of Hoare queries performed by on transition systems from with is at least . To this end, we show that if over performs less than queries, then there exist two formulas over such that

all the Hoare queries performed by on and (the transition relations of and , respectively) return the same result, even though

should return different results when run on and , since has an invariant in and does not have an invariant (of any length).
We begin with some notation. Running on input , we abbreviate by . Denote the queries performs and their results by . We call an index sat if . We say that queryagrees with if for all . We say that satqueryagrees with if for every such that it holds that .
We first find a formula over such that the sequence of queries performs when executing on is maximally satisfiable: if satqueryagrees with , then (completely) queryagrees with on the queries, that is,
(3) 
We construct this sequence iteratively (and define accordingly) by always taking so that the result of the next query is false as long as this is possible while remaining consistent with the results of the previous queries: Initially, choose some arbitrary . At each point , consider the first queries performs on , . If terminates without performing another query, we are done: the desired is . Otherwise let be the next query. Amongst formulas that queryagree on the first queries, namely, for all , choose one such that if possible; if such does not exist take e.g. . The dependency of on is solely through the results of the queries to , so performs the same initial queries when given . The result is a maximally satisfiable sequence, for if a formula differs in query in which the result is false instead of true we would have taken such a as .
Let be such a formula with a maximally satisfiable sequence of queries performs on , . For every sat , take a counterexample . The single transition of depends on the value of on at most one assignment to , so there exists a valuation such that
(4) 
as well. It follows that
(5) 
Let be the valuations derived from the sat queries (concerning indexing, iff for some ). We say that a formula valuationagrees with on if for all ’s. Since the sequence of queries is maximally satisfiable, if valuationagrees with on then queryagrees with , namely, for all . As the dependency of on is solely through the results , it follows that performs the same queries on as it does on and returns the same answer.
It remains to argue that if then there exist two formulas that valuationagree with on but differ in the correct result should return: is true, and so has an invariant in (Lemma 6.3), whereas is not, and so does not have an invariant of any length or form (Lemma 6.3). This is possible because the number of constraints imposed by valuationagreeing with on is less than the number of possible valuations of for every valuation of and vice versa:
(6) 
is true on all valuations except for some of , and since there exists some such that for all , is not one of these valuations (recall that bits). Dually,
(7) 
is false on all valuations except for some of , and since for every
Comments
There are no comments yet.