# Are There Good Mistakes? A Theoretical Analysis of CEGIS

Counterexample-guided inductive synthesis CEGIS is used to synthesize programs from a candidate space of programs. The technique is guaranteed to terminate and synthesize the correct program if the space of candidate programs is finite. But the technique may or may not terminate with the correct program if the candidate space of programs is infinite. In this paper, we perform a theoretical analysis of counterexample-guided inductive synthesis technique. We investigate whether the set of candidate spaces for which the correct program can be synthesized using CEGIS depends on the counterexamples used in inductive synthesis, that is, whether there are good mistakes which would increase the synthesis power. We investigate whether the use of minimal counterexamples instead of arbitrary counterexamples expands the set of candidate spaces of programs for which inductive synthesis can successfully synthesize a correct program. We consider two kinds of counterexamples: minimal counterexamples and history bounded counterexamples. The history bounded counterexample used in any iteration of CEGIS is bounded by the examples used in previous iterations of inductive synthesis. We examine the relative change in power of inductive synthesis in both cases. We show that the synthesis technique using minimal counterexamples MinCEGIS has the same synthesis power as CEGIS but the synthesis technique using history bounded counterexamples HCEGIS has different power than that of CEGIS, but none dominates the other.

## Authors

• 17 publications
• 51 publications
• ### A Theory of Formal Synthesis via Inductive Learning

Formal synthesis is the process of generating a program satisfying a hig...
05/15/2015 ∙ by Susmit Jha, et al. ∙ 0

• ### Model Repair Revamped: On the Automated Synthesis of Markov Chains

This paper outlines two approaches|based on counterexample-guided abstra...
05/27/2021 ∙ by Milan Ceska, et al. ∙ 0

• ### The Prioritized Inductive Logic Programs

The limit behavior of inductive logic programs has not been explored, bu...
06/10/2002 ∙ by Shilong Ma, et al. ∙ 0

• ### Inductive Program Synthesis Over Noisy Data

We present a new framework and associated synthesis algorithms for progr...
09/22/2020 ∙ by Shivam Handa, et al. ∙ 0

• ### Counterexample-Driven Synthesis for Probabilistic Program Sketches

Probabilistic programs are key to deal with uncertainty in e.g. controll...
04/28/2019 ∙ by Milan Ceska, et al. ∙ 0

• ### RbSyn: Type- and Effect-Guided Program Synthesis

In recent years, researchers have explored component-based synthesis, wh...
02/25/2021 ∙ by Sankha Narayan Guria, et al. ∙ 0

• ### Automated Formal Synthesis of Lyapunov Neural Networks

We propose an automated and sound technique to synthesize provably corre...
03/19/2020 ∙ by Alessandro Abate, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Automatic synthesis of programs has been one of the holy grails of computer science for a long time. It has found many practical applications such as generating optimal code sequences [28, 21], optimizing performance-critical inner loops, generating general-purpose peephole optimizers [4, 5], automating repetitive programming, and filling in low-level details after the higher-level intent has been expressed [34]. A traditional view of program synthesis is that of synthesis from complete specifications. One approach is to give a specification as a formula in a suitable logic [26, 27, 14]. Another is to write the specification as a simpler, but possibly far less efficient program [28, 34, 21]. While these approaches have the advantage of completeness of specification, such specifications are often unavailable, difficult to write, or expensive to check against using automated verification techniques. This has led to proposal of oracle guided synthesis approach [20] in which the complete specification is not available. All these different variants of automated synthesis techniques share some common characteristics. They are iterative inductive synthesis techniques which require some kind of validation engines to validate candidate programs produced at intermediate iterations, and these validation engines identify counterexamples, aka mistakes, which are subsequently used for inductive synthesis in the next iteration. We collectively refer to such synthesis techniques as counterexample-guided inductive synthesis, aka .

In this paper, we conduct a theoretical study of by examining the impact of using different kinds of validation engines which provide different nature of counterexamples. has been successfully used across domains and has been applied to areas such as integer program synthesis and controller design where the candidate set of designs is not finite, and the synthesis technique is not guaranteed to always succeed. This raises an interesting question whether the power of can be improved by considering validation engines which provide better counterexamples than any arbitrary counterexample.

We consider two kinds of counterexamples in this paper.

• First, we consider minimal counterexamples instead of arbitrary counterexamples. For any predefined ordering on the examples, we require that the validation engine provide a counterexample which is minimal. This defines an alternative synthesis technique: Minimal Counterexample Guided Inductive Synthesis where the validation engine returns minimal counterexamples.

This choice of counterexamples is motivated by literature on debugging111 Practically, this would mean replacing satisfiability solving based verification engines with those using Boolean optimization such as maximum satisfiability solving techniques.. Significant effort has been made on improving validation engines to produce counterexamples which aid debugging by localizing the error. The use of counterexamples in conceptually is an iterative repair process and hence, it is natural to extend successful error localization and debugging techniques to inductive synthesis. Minimal counterexamples is inspired specifically from [29, 9].

• Second, we consider history bounded counterexamples where the counterexample produced by the validation engine must be smaller than a previously seen positive example. This defines another alternative synthesis technique: History Bounded Counterexample Guided Inductive Synthesis where the validation engine returns history bounded counterexamples.

This choice of counterexample is also motivated by literature on debugging. In particular, [13, 36] use distance of the counterexample from a correct example to help debug programs. If the counterexample is very close to a correct example, then the error localization would be more accurate. We use a similar notion and force the counterexamples produced by the validation engine to be close to some previously seen correct example.

For each of the variants of , we analyze whether it increases the candidate spaces of programs where a synthesizer terminates with correct program. We prove the following in the paper.

1. successfully terminates with correct program on a candidate space if and only if also successfully terminates with the correct program. So, there is no increase or decrease in power of synthesis by using minimal counterexamples.

2. can synthesize programs from some program classes where fails to synthesize the correct program. But contrariwise, also fails at synthesizing programs from some program classes where can successfully synthesize a program. Thus, their synthesis power is not equivalent, and none dominates the other.

Thus, none of the two counterexamples considered in the paper are strictly good mistakes. The history bounded counterexample can enable synthesis in additional classes of programs but it also leads to loss of some synthesis power.

## 2 Motivating Example

In this section, we present a simple example that illustrates why it is non-intuitive to estimate the change in power of synthesis when we consider alternative kinds of counterexamples. Consider synthesizing a program which takes as input a tuple of two integers

and outputs if the tuple lies in a specific rectangle (defined by diagonal points and and otherwise.

The target program is:

 if((−1≤x&&x≤1)&&(−1≤y&&y≤1))op=1elseop=0

The candidate program space is the space of all possible rectangles in where denotes the set of integers, that is,

 if((αx≤x&&x≤βx)&&(αy≤y&&y≤βy))op=1elseop=0

where are the parameters that need to be discovered by the synthesis engine.

Now, consider a radial ordering of which uses as the ordering index. If we consider synthesis using minimal counter-examples, it is clear that we can learn the rectangle: starting with an initial candidate program that always outputs for all in ; validation engine producing minimum counterexamples would discover the rectangle boundaries. One possible sequence of minimal counterexamples would be . Since the boundary points form a finite set, will terminate with the correct program. But if the counterexamples are arbitrary as in , it is not obvious whether the rectangle can be still learnt. Our paper proves that can also learn such a rectangle.

The question of synthesis power of different techniques using different nature of counterexamples is non-trivial when the space of programs is not finite. Even termination of inductive synthesis technique is not guaranteed when the candidate space of programs is infinite. Thus, the question of comparing the relative power of these synthesis techniques is interesting.

## 3 Related Work

Automated synthesis of systems using counterexamples has been widely studied in literature [33, 35, 20, 17, 8] as discussed in Section 1. While the applications of to different domains have been very extensively investigated, theoretical characterization of the approach independent of the application domain has received limited attention. To the best of our knowledge, this is the first attempt at a theoretical investigation into how the nature of counterexamples in would impact the power of inductive synthesis technique to synthesize programs.

The inductive generalization used in is similar to algorithmic learning from examples [12, 11, 22, 10, 30]. This relation between the two fields has been previously identified in  [20]

. A learning procedure is provided with strings from a formal language and the task of the learner is to identify the formal grammar for the language. Learning is an iterative inductive inference process. In each iteration, the learning procedure is provided a string. The string is either in the language, that is, it is a positive example, or the string is not in the language, that is, it is negative example. Based on the examples, the learning procedure proposes a formal grammar in each iteration. The learning procedure is said to be able to learn a formal language if the learner converges to the correct grammar of the formal language after a finite number of iterations. The algorithmic learning techniques can be classified across the following three dimensions:

1. Nature of examples: Examples could be restricted to only positive examples, or it could include negative examples too.

2. Memory of learner: The memory of the learner is allowed to grow infinitely or it could be bounded to a finite size.

3. Communication of examples to learner: The examples could be provided to the learner arbitrarily or as responses to specific kind of queries from the learner such as membership or subset queries.

We discuss the known theoretical results for algorithmic learning across these dimensions and identify how the results presented in this paper extend these existing results.

Gold [12] considered the problem of learning formal languages from examples. Similar inductive generalization techniques have been studied elsewhere in literature as well [19, 38, 6, 2]. The examples are provided to learner as an infinite stream. The learner is assumed to have unbounded memory and can store all the examples. This model is unrealistic in a practical setting but provides useful theoretical understanding of inductive generalization. Gold defined a class of languages to be identifiable in the limit if there is a learning procedure which identifies the grammar of the target language from the class of languages using a stream of input strings. The languages learnt using only positive examples were called text learnable and the languages which require both positive and negative examples were termed informant learnable. We examine the known results for both: text learnable and informant learnable classes of languages. None of the standard classes of formal languages are identifiable in the limit from text, that is, from only positive examples [12]. This includes regular languages, context-free languages and context-sensitive languages. It is also known that no class of language with at least one infinite language over the same vocabulary as the rest of the languages in the class, can be learnt purely from positive examples. We can illustrate this infeasibility of identifying languages from positive examples with a simple example.

Consider a vocabulary and let be all the strings that can be formed using vocabulary . The strings in are . Let us consider the set of languages

 L1=V∗−{x1},L2=V∗−{x2},…

Now a simple algorithm to learn languages from positive examples can guess the language to be if is the string with the smallest index not seen so far as a positive example. This algorithm can be used to inductively identify the correct language using just positive examples. But now, if we add a new language which contains all the strings from vocabulary to our class of language, that is,

 L2=V∗,V∗−{x1},L2=V∗−{x2},…

The above algorithm would fail to identify this class of languages.

In fact, no algorithm using positive examples would be able to inductively identify this class of languages. The key intuition is that if the data is all positive, no finite trace of positive data can distinguish whether the currently guessed language is the target language or is merely a subset of the target language. Now, if we consider the presence of negative counterexamples, the learning or synthesis algorithm can begin with the first guess as . If there are no counterexamples, then is the correct language. If a counterexample is obtained, then the next guess is , and this is definitely the correct language.

A detailed survey of classical results in learning from positive examples is presented in [25]

. The results summarize learning power with different limitations such as the inputs having certain noise, that is, a string not in the target language might be provided as a positive example with a small probability. Learning using positive as well as negative examples has also been well-studied in literature. A detailed survey is presented in

[18] and [23]. In contrast to this line of work, is a practical inductive generalization which restricts the memory of the synthesis engine or learner. At any step, the synthesis engine only has the candidate design and response from the verifier which can be stored in a finite memory. Further, in contrast to learning from an infinite stream of positive and negative examples, inductive generalization relies on using counterexamples. The positive and negative examples used in are not arbitrary but rather they depend on the counterexample-generating verifier and the intermediate candidate programs proposed by the synthesis engine.

Another related line of work is that of techniques using iterative algorithmic learning with restricted memory of the learner [24, 37]. The learner or synthesis engine can use only finite memory but these techniques rely on availability of an infinite stream of positive examples in addition to negative examples. While the stream is not explicitly stored due to finite memory constraint, it can be used for synthesizing intermediate concepts. In contrast, relies on using positive examples which are derived from the specification with respect to the counterexamples. These techniques differ from in the dimension of how counterexamples are communicated to the learner or synthesis engine.

Angluin [3] considered a similar learning environment as with respect to the communication of counterexamples to the learner or synthesis engine. Angluin’s learning model consists of a teacher or oracle which provides responses to queries from the learner. The teacher in the context of Angluin is analogous to verifier in and the learner is the synthesis engine. Similar learning models have also being proposed in [32, 15, 7, 16]. But they focus on complexity analysis of learning techniques using different kinds of queries such as membership queries, verification or equivalence queries and subset queries. In contrast, we restrict ourselves to verification queries and investigate the impact of substituting arbitrary counterexample producing verifiers with more powerful verifiers which produce counterexamples which are minimal or bounded.

Verification techniques have been adapted to provide more meaningful counterexamples [13, 29, 36, 9] for the purpose of aiding design debugging. The key idea is that these more powerful verification engines that provide not just any arbitrary counterexamples but rather a simpler counterexample with respect to some metric can be used for better debugging. These simpler or minimal counterexamples provide the most information to help localize bugs in a faulty design. If a counterexample trace is close to a correct trace and differs from a correct trace in a minimal way, then it can be used more effectively to localize the source of bug and fix it. It is natural to consider extending this use of minimal counterexamples for debugging to also enable more powerful synthesis. In this work, we conduct a theoretical analysis of using these more power verification engines and using counterexamples produced by these to aid in synthesis.

## 4 Notation

In this section, we define some preliminary notation used in our definition and analysis of CEGIS and MinCEGIS. represents the set of natural numbers. denotes a subset of natural numbers . denotes the minimal element in the set . The union of the sets is denoted by and the intersection of the sets is denoted by .

A sequence is a mapping from to . We denote a prefix of length of a sequence by . So, of length is a mapping from to . is an empty sequence also denoted by . We denote the natural numbers in the range of by , that is, . The set of sequences is denoted by .

We extend natural numbers to pairs. Let be any bijective computable function from which is monotonically increasing in both of its arguments. Similarly, pairs can be extended to -tuples. Assuming existence of such a bijective mapping, tuples can also be used in place of natural numbers as elements of a language. A language in this case would be a subset of such tuples.

We also use standard definitions from computability theory [31]. A set of natural numbers is called computable or recursive language if there is an program, that is, a computable, total function such that for any natural number , if and if . We denote the complement of language by . We denote the union of two languages and by , and the intersection of two languages and by . Also for convenience, we use to denote using the one to one mapping between languages and programs that identify them. Thus, we distinguish only between semantically different programs and not the syntactically different programs which identify the same language.

The languages are sets of natural numbers. The natural numbers correspond to indexed elements of the language or valid input-output traces of the program. Without loss of generality, the natural ordering of natural numbers is used as an ordering of elements in the set. In practice, this will correspond to some user-provided ordering on the elements of the language. For example, for a program manipulating strings, we can choose alphabetical ordering and for program operating on numerical tuples, we can choose lexicographical ordering. We define a minimum operator which uses this natural ordering to report the minimum element in the language . If the ordering is not total, denotes one of the minimal elements in the language with respect to the given partial ordering.

Given a sequence of non-empty languages , is said to be an indexed family of languages if and only if there exists a recursive function such that . We denote the corresponding set of programs by . For brevity, we refer to also as . Intuitively, defines the encoding of candidate program space similar to sketches in [34] and the component interconnection encoding in [20]. The index is used to index into this encoding to select a particular program . denotes the output of the program on input .

Inductive synthesis consists of synthesis engines each of which identify the correct program using a set of examples from the target language from a given indexed family of languages . So, the overall synthesis problem is as follows. Let be the class of candidate programs corresponding to indexed family of languages . No, given some target language from , the synthesis engine receives a set of examples . The synthesis task is to identify corresponding to from the candidate programs . is a particular kind of inductive synthesis techniques in which examples are obtained using counterexamples produced through iterative validation of inductively produced intermediate conjecture programs. We use the notations developed in this section to formally define concepts useful for theoretical analysis of in the next section.

## 5 Definitions

In this section, we present some definitions. Trace is a sequence of examples from the target language . The formal definition of trace is as follows:

• Trace : A trace for a language is a sequence with . denotes the prefix of the trace of length . denotes the -th element of the trace.

Counterexample guided inductive synthesis () techniques employ a verifier to provide counterexamples. So, we define verifiers for a language formally below and then, give a formal definition of a engine denoted by . Intuitively, the verifier returns a counterexample if the languages are different and returns if they are equivalent. We use one way difference instead of the symmetric difference between sets for ease of presentation.

• A verifier for a language is a non-deterministic mapping from to
such that if and only if , and otherwise.

• A engine is defined recursively below.

where is a recursive function that characterizes the engine and how it eliminates counterexamples, is a trace for language and is a counterexample sequence such that
.
is a predefined constant representing an initial guess of the program, which for example, could be program corresponding to the universal language .

Intuitively, is provided with a trace along with a counterexample trace formed by counterexamples to the latest conjectured languages. Thus, receives two inputs. The counterexample is generated through a subset query.

• We say that converges to if and only if for all, but finitely many prefixes of , . We denote this by . In other words, if and only if there exists such that for all , .

• identifies a language if and only if for all traces of the language and counterexample sequences , . identifies a language family if and only if identifies every .

We now define the set of language families that can be identified by the counterexample guided synthesis engines as formally below.

• identifies

Now, we consider a variant of counterexample guided inductive synthesis where we use minimal counterexamples instead of arbitrary counterexamples . We define a minimal counterexample generating verifier before defining . This requires an ordering of the elements in the language.

• A verifier for a language is a mapping from to such that
if and only if , and otherwise.

• A engine is defined recursively below.

where is a recursive function that characterizes the engine and how it eliminates counterexamples, is a trace for language and is a counterexample sequence such that
.
is a predefined constant representing an initial guess of the program, which for example, could be program corresponding to the language .

The convergence of the synthesis engine to a language and family of languages is defined in similar way as .

• We say that converges to , that is, if and only if there exists such that for all , .

• identifies a language if and only if for all traces of the language and counterexample sequences , . identifies a language family if and only if identifies every .

• identifies

Next, we consider another variant of counterexample guided inductive synthesis where we use history bounded counterexamples instead of arbitrary counterexamples. We define a history bounded counterexample generating verifier before defining . Unlike the previous cases, the verifier for generating history bounded counterexample is also provided with the trace seen so far by the synthesis engine. The verifier generates a counterexample smaller than the largest element in the trace. If there is no counterexample smaller than the largest element in the trace, then the verifier does not return any counterexample. From the definition below, it is clear that we need to only order elements in the language and do not need to define an ordering of with respect to the language elements since the comparison is done between an element in non-empty and elements in the trace.

• A verifier for a language is a mapping from to such that
where for some , and otherwise.

• A engine is defined recursively below.
where is a recursive function that characterizes the engine and how it eliminates counterexamples, is a trace for language and is a counterexample sequence such that
.
is a predefined constant representing an initial guess of the program, which for example, could be program corresponding to the language .

The convergence of the synthesis engine to a language and family of languages is defined in similar way as and .

• We say that converges to , that is, if and only if there exists such that for all , .

• identifies a language if and only if for all traces of the language and counterexample sequences , . identifies a language family if and only if identifies every .

• identifies

## 6 Main Result

In this section, we present the main results of the paper. We first compare and in the first part of the section followed by and in the second part of the section. Since the focus of our work is to analyze the impact of change in the power of counterexample providing verification engine, we fix the inductive generalization function that eliminates counterexamples. So, we vary the counterexample generating verifier , and but is constant in our definitions of , and . In the rest of the section, we present the two central results of this paper:

### 6.1 Synthesis Using Minimal Counterexamples

We investigate whether and prove that it is in fact true. So, replacing a verification engine which returns arbitrary counterexamples with a verification engine which returns minimal counterexamples does not increase the power of inductive synthesis system. The main result regarding this non-intuitive fact that there is no change in the power of synthesis technique by using minimal counterexamples is summarized in Theorem 6.1.

###### Theorem 6.1

The power of synthesis techniques using arbitrary counterexamples and those using minimal counterexamples are equivalent, that is, .

• trivially. is a special case of and minimal counterexample reported by can be treated as arbitrary counterexample to simulate using . Intuitively, using minimal counterexample is not worse than using arbitrary counterexamples.

The more interesting case to prove is . For a language , let converge to on trace . We show that can simulate and also converge to on trace .

The proof idea is to simulate in two phases. In one phase, finds the minimal counterexample for a candidate language by iteratively calling on where . The minimum for which returns a counterexample for is the minimum counterexample. In the second phase, consumes the next elements from the trace. While searching for minimum counterexample, needs to store the backlog of the traces as well as cache the minimum counterexample for candidate languages.

We now present the formal description of the proof. For this simulation, we use some auxiliary variables maintained by which store some finite information required for simulating . The key idea is for to iteratively guess the minimal counterexample in multiple micro-steps and then use that to simulate one step of . But simulating each step of takes finite number of micro-steps for and uses finite storage.

The first auxiliary component for this simulation is a minimal counterexample map

 lce:P→N∪{⊤}∪{⊥}

Intuitively, this maps a candidate program (language ) to minimal counterexample as known to so far in simulating . If minimal counterexample is not known for a given program, maps the program to . If there is no counterexample to a given program, maps the program to . At any given step, only finite number of programs have their minimal counterexamples known, and the rest are mapped to .

Next, we define a mapping from which simulates based on the known so far, that is,

 Tlce(P,τ[n])=PnwherePi=F(Pi−1,τ(i),lce(Pi−1))fori=1,2,…andP0=P

if is defined for and it is undefined if is for any .

simulates using the same counterexamples and intermediate candidate programs for the known history . If is for any of the intermediate programs, is undefined. Further, we record the program proposed by into the variable and the last program which initiated search for minimal counter example in . records the part of the trace already simulated by and is the candidate minimal counterexample while searching for minimal counterexample.

Initialization: All the internal auxiliary variables are initialized as follows. which is the same initialization as being simulated, , , and . is initialized to map all to as no minimal counterexamples are known at the beginning.

Update: We describe the updates made in each iteration . One of the following cases is true in each iteration.
Case 1: If , that is, we are in lock-step with the synthesis algorithm with the same candidate program.
Case 1.1: If there is any counterexample for (found using the verifier for ), that is, the candidate program has a counterexample and we need to find the corresponding minimal counterexample.
Case 1.1.1: If is not , that is, the minimal counterexample for candidate program is already part of .
Let be the longest prefix for such that is defined. , ,
We use the minimal counterexample from and then advance the simulation traces ahead if can simulate the trace using minimal counterexamples from for all the intermediate candidate programs.
Case 1.1.2: If is , that is, the minimal counterexample for candidate program is not known.
,
We initialize the candidate language for searching for minimal counterexample to , that is, it is either the language consisting only of the minimal element or is empty. Since our verifier uses a subset query, empty language will return no counterexamples.
Case 1.2: If there is no counterexample for ,
Let be the longest prefix for such that is defined. , , and ,
The candidate program seen so far is subset of the target language and we consume as much of the trace as possible for which is defined.

Case 2: If , that is, the simulation is trying to find the minimum counterexample as a result of case 1.1.2.
Case 2.1: If there is any counterexample for (found using the verifier for ),
Update . Let be the longest prefix for such that is defined. , , ,
If there is a counterexample, since the candidate language was a single element set or empty, and verification engine checks for containment in the target language, the only element in the language has to be the counterexample. Further, starting from step 1.1.2 and with possible increments in step 2.2, we stop with the minimal counterexample in this step and add it to the .
Case 2.2: If there is no counterexample for , that is, we have not yet found the minimal counterexample.
, , and .
We increment and search for whether is in the target language. This is either empty or is a language consisting of a single element .

Progress: Now, we first show progress of the simulation in parsing trace . For any , there exists such that , and is a proper prefix of . This follows from the observation that Case 2.2 can not be repeated infinitely after Case 1.1.2 since has at least one counterexample. So, case 2.1 would eventually become true and since is extended, would be defined for a longer prefix.

Correctness: Let converge on after reading prefix . From progress, after some , would be a prefix of . Since converges after reading , for . Now, is not for all intermediate programs in for . So, and for all , So, also converges to , that is, .

Thus, .

Thus, successfully terminates with correct program on a candidate space if and only if also successfully terminates with the correct program. So, there is no increase or decrease in power of synthesis by using minimal counterexamples.

### 6.2 Synthesis Using History Bounded Counterexamples

We investigate whether or not, and prove that they are not equal. So, replacing a verification engine which returns arbitrary counterexamples with a verification engine which returns counterexamples bounded by history has impact on the power of the synthesis technique. But this does not strictly increase the power of synthesis. Instead, the use of history bounded counterexamples does allow programs from new classes to be synthesized but at the same time, program from some program classes which were amenable to can no longer be synthesized using history bounded counterexamples. The main result regarding the power of synthesis techniques using history bounded counterexamples is summarized in Theorem 6.2.

###### Theorem 6.2

The power of synthesis techniques using arbitrary counterexamples and those using history bounded counterexamples are not equivalent, and none is more powerful than the other. . In fact, and .

We prove this using the following two lemma. The first lemma 6.3 shows that there is a family of languages from which a program recognizing a language can be synthesized by but, this can not be done by . The second lemma 6.4 shows that there is another family of languages from which a program recognizing a language can be synthesized by but not by .

###### Lemma 6.3

There is a family of languages such that for the candidate programs corresponding to , can not synthesize a program in recognizing some language in but can synthesize , that is,

• Consider the languages formed by upper bounding the elements by some fixed constant, that is,

 Li={n|n∈N∧n≤i}

Now, consider the family of languages consisting of these, that is, . Given this family , let the target language (for which we want to synthesize a recognizing program ) be .

If we obtain a trace at any point in synthesis using history bounded counterexamples, then for any intermediate program proposed by , would always return since all the counterexamples would be larger than any element in . This is the consequence of the chosen languages in which all counterexamples to the language are larger than any positive example of the language. So, can not synthesize corresponding to the target language .

But we can easily design a synthesis engine using arbitrary counterexamples that can synthesize corresponding to the target language . The algorithm starts with as its initial guess. If there is no counterexample, the algorithm next guess is . In each iteration , the algorithm guesses as long as there are no counterexamples. When a counterexample is returned by on the guess , the algorithm stops and reports the previous guess as the correct language.

Since the elements in each language is bounded by some fixed constant , the above synthesis procedure is guaranteed to terminate after iterations when identifying any language . Further, did not return any counterexample up to iteration and so, . And in the next iteration, a counterexample was generated. So, . Since, the languages in form a monotonic chain . So, . In fact and in the -th iteration, the language is correctly identified by .

Thus, .

This shows that can be used to identify programs when will fail. Putting a restriction on the verifier to only produce counterexamples which are bounded by the positive examples seen so far does not strictly increase the power of synthesis.

We now show the nonintuitive result that this restriction enables synthesis of programs which can not be synthesized by . The proof uses a diaganolization argument similar to the argument used in [25] for showing the increase in inductive synthesis power when negative examples are introduced in addition to the positive examples. This argument is presented in Section 3. Recall that the set of languages considered in that case were and the language . The argument relies on indistinguishability of and with respect to finite traces of positive examples.

In the proof below, we similarly construct a language which is not distinguishable using arbitrary counterexamples and instead, it relies on the verifier keeping a record of the largest positive example seen so far and restricting counterexamples to those below the largest positive example. We use the tuple notation introduced in Section 4 to clearly identify the diagnolization.

###### Lemma 6.4

There is a family of languages such that for the candidate programs corresponding to , can not synthesize a program in recognizing some language in but can synthesize , that is,

• Consider the following languages . We now construct a family of languages in which are finite and have atleast one element of the form , that is,

 Lfin={L01i|i∈N∧|L01i|is finite% ∧∃ks.t.⟨1,k⟩∈L01i}

Now consider the languages which are subsets of . We consider only those languages such that the index of the language is also the smallest element in the language, that is, . We now build a language of pairs as follows: if and undefined, otherwise We construct a second family of languages using these languages. if is defined for index . Now, we consider the following family of languages

 L=Lfin∪Ldiag

We show that there is a language in such that the program recognizing can not be synthesized by but can synthesize all programs recognizing any language in .

The key intuition is as follows. If the examples seen by synthesis algorithm are all of the form , then any synthesis technique can not differentiate whether the language belongs to or . If the language belongs to , the synthesis engine would eventually obtain an example of the form (since each language in has atleast one element of this kind and these languages are finite). While the synthesis technique using arbitrary counterexamples can not recover the previous examples, the techniques with access to the verifier which produces history bounded counterexamples can recover all the previous examples.

We can easily specify a which can synthesize programs that correspond to languages in . works as follows. If all the elements seen so far are of the form , then the synthesis algorithm and picks the minimum such that has been seen as an example by the synthesis engine. The proposed program is corresponding to . If the proposed program is not the correct program, returns such that . This is guaranteed since returns counterexamples smaller than the examples seen so far, and we have assumed that is not correct. So, iteratively, the algorithm would discover a language from eventually. But if the language is from , then we know that all languages in are finite and have at least one element of the form . After sees , for every future positive example , it queries with the singleton language having only one element . Clearly, is not in the language since it only contains elements of the form and . But returns no counterexample for if is the largest positive example seen so far. At this point, we can recover all positive examples seen previously by enumerating all and testing the candidate language with . We get a counterexample if and only if is not in the target language. Further, the target language is finite and hence, enumerating members of the language is sufficient to identify the target language after consuming a finite trace. Thus, can synthesize programs corresponding to any language in .

We now prove the infeasibility of for this class of languages. Let us assume that . So, there is a synthesis engine which can synthesize programs corresponding to languages in . Let us consider trace and counterexample sequence such that converges in steps that is . Now, is a valid counterexample sequence of any language such that . Since must recognize a language from any trace and any arbitrary counterexample sequence, we choose a trace and counterexample sequence as follows. Let us consider a trace of the form . The corresponding counterexample trace discovered by is followed by minimal counterexamples, if any, after observing . Now, we pick an element such that and . Since is a valid counterexample sequence of any language such that , the behavior of is same for as it is for . Thus, can not distinguish between the two languages: and . Intuitively, can forget some positive examples seen before observing and there is no way to regenerate these as it can be done with .

Thus, .

Hence, can synthesize programs from some program classes where fails to synthesize the correct program. But contrariwise, also fails at synthesizing programs from some program classes where can successfully synthesize a program. Thus, their synthesis power is not equivalent, and none dominates the other.

## 7 Discussion and Conclusion

The paper presents formal analysis of the impact of counterexample selection on what programs can be synthesized, without any restriction on the type of program other than it be from a countable set. We have shown that the use of minimal counterexamples does not enable synthesizing programs from newer space of candidate programs. In practice, this means that any domain where can be used, use of would also be possible since successfully terminates with correct program on a candidate space if and only if also successfully terminates with the correct program. So, there is no increase or decrease in the power of synthesis by using minimal counterexamples. But can synthesize programs from some program classes where fails to synthesize the correct program. Contrariwise, also fails at synthesizing programs from some program classes where can successfully synthesize a program. Thus, their synthesis power is not equivalent, and none dominates the other. This paper is a first step towards the theoretical characterization of Counterexample Guided Inductive Synthesis technique: .

Further analysis of is pertinent given the widespread adoption of as one of the standard paradigms for automated synthesis. We envision the following directions in which further work can be done to better understand the power of techniques.

• Speed of convergence: and have equal synthesis power and if one of the techniques successfully identifies a program from a given program class, the other would also be able to successfully synthesize this program. But would both techniques need the same number of counterexamples for successfully synthesizing the program ? If we measure the complexity of automated synthesis using the number of counterexamples needed to synthesize a program, the comparison of the complexity of and is open.

Similarly, for the program spaces on which both and terminate, can we compare the number of counterexamples needed by the two techniques to synthesize a program.

• Newer variants of counterexamples: The two new variants of counterexamples considered in this paper; namely, the minimal counterexamples and the history bounded counterexamples are not the only variants that can be used in . The question of whether there are other variants of counterexamples which would enable synthesis in program spaces beyond the power of conventional is open.

In particular, consider another new variant of counterexamples which are minimal counterexamples among all the counterexamples which are larger than the largest positive examples seen so far. This counterexample captures another notion of being close to correct counterexample, and it would be interesting to investigate whether it increases the power of .

In summary, we presented variants of