 # Synthesizing Imperative Programs from Examples Guided by Static Analysis

We present a novel algorithm that synthesizes imperative programs for introductory programming courses. Given a set of input-output examples and a partial program, our algorithm generates a complete program that is consistent with every example. Our key idea is to combine enumerative program synthesis and static analysis, which aggressively prunes out a large search space while guaranteeing to find, if any, a correct solution. We have implemented our algorithm in a tool, called SIMPL, and evaluated it on 30 problems used in introductory programming courses. The results show that SIMPL is able to solve the benchmark problems in 6.6 seconds on average.

Comments

There are no comments yet.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Our long-term goal is to build an intelligent tutoring system that helps students to improve their programming skills. Our experience in introductory programming courses is that students, who learn programming for the first time, often struggle with solving programming problems for themselves. Manually providing guidance simply does not scale for the increasingly large number of students. To make matters worse, we found that even instructors sometimes make mistake and shy students are reluctant to ask questions. Motivated by this experience, we aim to build an automatic system that helps students to improve their skills without human teachers.

In this paper, we present a key component of the system, which automatically generates complete programs from students’ incomplete programs. The inputs of the algorithm are a partial program with constraints on variables and constants, and input-output examples that specify the program’s behavior. The output is a complete program whose behavior matches all of the given input-output examples.

The key novelty of our algorithm is to combine enumerative program synthesis and program analysis techniques. It basically enumerates every possible candidate program in increasing size until it finds a solution. This algorithm, however, is too slow to be interactively used with students due to the huge search space of programs. Our key idea to accelerate the speed is to perform static analysis alongside the enumerative search, in order to “statically” identify and prune out interim programs that eventually fail to be a solution. We formalize our pruning technique and its safety property.

The experimental results show that our algorithm is remarkably effective to synthesize introductory imperative programs. We have implemented the algorithm in a tool, Simpl, and evaluated its performance on 30 programming tasks used in introductory courses. With our pruning technique, Simpl is fast enough to solve each problem in 6.6 seconds on average. However, without the pruning, the baseline algorithm, which already adopts well-known optimization techniques, takes 165.5 seconds (25x slowdown) on average.

We summarize our contributions below:

• We present a new algorithm for synthesizing imperative programs from examples. To our knowledge, our work is the first to combine enumerative program synthesis and static analysis technologies.

• We prove the effectiveness of our algorithm on 30 real programming problems used in introductory courses. The results show that our algorithm quickly solves the problems, including ones that most beginner-level students have hard times to solve.

• We provide a tool, Simpl, which is publicly available and open-sourced.111Hidden for double-blind reviewing.

## 2 Showcase

In this section, we showcase Simpl with four programming problems that most beginners feel difficult to solve. To use Simpl, students need to provide (1) a partial program, (2) a set of input-output examples, and (3) resources that Simpl can use. The resources consist of a set of integers, a set of integer-type variables, and a set of array-type variables. The goal of Simpl is to complete the partial program w.r.t. the input-output examples, using only the given resources.

#### Problem 1 (Reversing integer)

The first problem is to write a function that reverses a given integer. For example, given integer 12, the function should return 21. Suppose a partial program is given as

reverse (n){ r := 0; while(?){?}; return r;}

where ? denotes holes that need to be completed. Suppose further Simpl is provided with input-output examples , integers , and integer variables .

Given this problem, Simpl produces the solution in Figure 1(a) in 2.5 seconds. Note that, Simpl finds out that the integer ‘1’ is unnecessary and the final program does not contain it. Also, Simpl does not require sophisticated examples, so that Simpl can be easily used by inexperienced students.

#### Problem 2 (Counting)

The next problem is to write a function that counts the number of each digit in an integer. The program takes an integer and an array as inputs, where each element of the array is initially 0. As output, the program returns that array but now each array element at index stores the number of s that occur in the given integer. For example, when a tuple is given, the function should output ; 0 occurs once, 1 does not occur, and 2 occurs twice in ‘220’. Suppose the partial program is given as

count(n,a){ while(?){?}; return a;}

with examples , integers , integer variables , and an array variable .

For this problem, Simpl produces the program in Figure 1(b) in 0.2 seconds. Note that Simpl uses a minimal set of resources; i is not used though it is given as usable.

#### Problem 3 (Sum of sum)

The third problem is to compute for a given integer . Suppose the partial program

sum(n){ r := 0; while(?){?}; return r;}

is given with examples , integers , and integer-type variables .

Then, Simpl produces the program in Figure 1(c) in 37.6 seconds. Note that Simpl newly introduced a nested loop, which is absent in the partial program.

#### Problem 4 (Absolute sum)

The last problem is to sum the absolute values of all the elements in a given array. We provide the partial program:

abssum(a, len){ r := 0; i := 0;
while(i < len){ if(?){?} else{?}; i:=i+1;};
return r;}

where the goal is to complete the condition and bodies of the if-statement. Given a set of input-output examples , an integer , integer variables , and an array variable , Simpl produces the program in Figure 1(d) in 12.1 seconds.

## 3 Problem Definition

Language We designed an imperative language that is small yet expressive enough to deal with various programming problems in introductory courses. The syntax of the language is defined by the following grammar:

 ⊕→+∣−∣∗∣/∣%,≺→=∣>∣<\parl→x∣x[y],a→n∣l∣l1⊕l2∣l⊕n∣ ◊\parb→true∣false∣l1≺l2∣l≺n∣b1∧b2∣b1∨b2∣¬b∣ △\parc→l:=a∣skip∣c1;c2∣if b c1 c2∣while b c∣ □

An l-value () is a variable () or an array reference (). An arithmetic expression () is an integer constant , an l-value (), or a binary operation (). A boolean expression () is a boolean constant (), a binary relation (), a negation (), or a logical conjunction () and disjunction (). Commands include assignment (), skip (), sequence (), conditional statement (), and while-loop ().

A program is a command with input and output variables, where is the input variable, is the command, and is the output variable. The input and output variables and can be either of integer or array types. For presentation brevity, we assume that the program takes a single input, but our implementation supports multiple input variables as well.

An unusual feature of the language is that it allows to write incomplete programs. Whenever uncertain, any arithmetic expressions, boolean expressions, and commands can be left out with holes (). The goal of our synthesis algorithm is to automatically complete such partial programs.

The semantics of the language is defined for programs without holes. Let be the set of program variables, which is partitioned into integer and array types, i.e., . A memory state

is a partial function from variables to values (). A value is either an integer or an array of integers. An array is a sequence of integers. For instance, we write for the array of integers 1, 2, and 3. We write , , and for the length of , the element at index , and the array , respectively.

The semantics of the language is defined by the functions:

 A[[a]]:M→V,B[[b]]:M→B,C[[c]]:M→M

where , , and denote the semantics of arithmetic expressions, boolean expressions, and commands, respectively. Figure 2 presents the denotational semantics, where fix is a fixed point operator. Note that the semantics for holes is undefined.

Synthesis Problem A synthesis task is defined by the five components:

 ((x,c,y),E,Γ,Xi,Xa)

where is an incomplete program with holes, is a set of input-output examples. is a set of integers, is a set of integer-type variables, and is a set of array-type variables. The goal of our synthesis algorithm is to produce a complete command without holes such that

• uses constants and variables in and , and

• is consistent with every input-output example:

 ∀(vi,vo)∈E.(C[[c]]([x↦vi]))(y)=vo.

## 4 Synthesis Algorithm

In this section, we present our synthesis algorithm that combines enumerative search with static analysis.

### 4.1 Synthesis as State-Search

We first reduce the synthesis task into a state-search problem. Consider a synthesis task . The corresponding search problem is defined by the transition system where is a set of states, is a transition relation, is an initial state, and is a set of solution states.

• States : A state is a command possibly with holes, which is defined by the grammar in Section 3.

• Initial state : An initial state is a partial command .

• Transition relation : Transition relation determines the state that is immediately reachable from a state. The relation is defined as a set of inference rules in Figure 3. Intuitively, a hole can be replaced by an arbitrary expression (or command) of the same type. Given a state , we write for the set of all immediate next states, i.e., . We write for terminal states, i.e., states with no holes.

• Solution states : A state is a solution iff is a terminal state and it is consistent with all input-output examples:

 solution(s)⟺s↛∧∀(vi,vo)∈E.(C[[s]]([x↦vi]))(y)=vo.

### 4.2 Baseline Search Algorithm

Algorithm 1 shows the basic architecture of our enumerative search algorithm. The algorithm initializes the workset with (line 1). Then, it picks a state with the smallest size and removes the state from the workset (line 3). If is a solution state, the algorithm terminates and is returned (line 5). For a non-terminal state, the algorithm attempts to prune the state by invoking the function (line 7). If pruning fails, the next states of are added into the workset and the loop repeats. The details of our pruning technique is described in Section 4.3

. At the moment, assume

always fails.

The baseline algorithm implicitly performs two well-known optimization techniques. First, it maintains previously explored states and never reconsider them. Second, more importantly, it normalizes states so that semantically-equivalent programs are also syntactically the same. For instance, suppose is the current state. Before pushing it to the workset, we first normalize it to . To do so, we use four code optimization techniques: constant propagation, copy propagation, dead code elimination, and expression simplification [Aho et al.1986]. These two techniques significantly improve the speed of enumerative search.

In addition, the algorithm considers terminating programs only. Our language has unrestricted loops, so the basic algorithm may synthesize non-terminating programs. To exclude them from the search space, we use syntactic heuristics to detect potentially non-terminating loops. The heuristics are: 1) we only allow boolean expressions of the form

(or ) in loop conditions, 2) the last statement of the loop body must increase (or decrease) the induction variable , and 3) and are not defined elsewhere in the loop.

### 4.3 Pruning with Static Analysis

Now we present the main contribution of this paper, pruning with static analysis. Static analysis allows to safely identify states that eventually fail to be a solution. We first define the notion of failure states.

###### Definition 1.

A state is a failure state, denoted , iff every terminal state reachable from is not a solution, i.e.,

 fail(s)⟺((s→∗s′)∧s′↛⟹¬solution(s′)).

Our goal is to detect as many failure states as possible. We observed two typical cases of failure states that often show up during the baseline search algorithm.

###### Example 1.

Consider the program in Figure 4(a) and input-output example . When the program is executed with , no matter how the hole gets instantiated, the output value is no less than 2 at the return statement. Therefore, the program cannot but fail to satisfy the example .

###### Example 2.

Consider the program in Figure 4(b) and input-output example . Here, we do not know the exact values of and , but we know that must hold at the end of the program. However, there exists no such integer , and we conclude the partial program is a failure state.

Static Analysis We designed a static analysis that aims to effectively identify these two types of failure states. To do so, our analysis combines numeric and symbolic analyses; the numeric analysis is designed to detect the cases of Example 1 and the symbolic analysis for the cases of Example 2. The abstract domain of the analysis is defined as follows:

 ˆm∈ˆM=X→ˆV,ˆv∈ˆV=I×S

An abstract memory state maps variables to abstract values (). An abstract value is a pair of intervals () and symbolic values (). The domain of intervals is standard [Cousot and Cousot1977]:

 I=({⊥}∪{[l,u]∣l,u∈Z∪{−∞,+∞}∧l≤u},⊑I).

For symbolic analysis, we define the following flat domain:

 S=(SE⊤⊥,⊑S)  \it where%   SE→n∣βx (x∈Xi)∣SE⊕SE

A symbolic expression is a constant (), a symbol (), or a binary operation with symbolic expressions. We introduce symbols one for each integer-type variable in the program. The symbolic domain is flat and has the partial order: . We define the abstraction function that transforms concrete values to abstract values:

 α(n)  =  ([n,n],n)α(n1…nk)  =  ([min{n1,…,nk},max{n1,…,nk}],⊤).

The abstract semantics is defined in Figure 5 by the functions:

 ˆA[[a]]:ˆM→ˆV,ˆB[[b]]:ˆM→ˆB,ˆC[[c]]:ˆM→ˆM

where is the abstract boolean lattice.

Intuitively, the abstract semantics over-approximates the concrete semantics of all terminal states that are reachable from the current state. This is done by defining the sound semantics for holes: , , and . An exception is that integer variables get assigned symbols, rather than , in order to generate symbolic constraints on integer variables.

In our analysis, array elements are abstracted into a single element. Hence, the definitions of and do not involve . Because an abstract array cell may represent multiple concrete cells, arrays are weakly updated by joining () old and new values. For example, in memory state , evaluates to .

For while-loops, the analysis performs a sound fixed point computation. If the computation does not reach a fixed point after a fixed number of iterations, we apply widening for infinite interval domain, in order to guarantee the termination of the analysis. We use the standard widening operator in [Cousot and Cousot1977]. The function and in Figure 5 denote a post-fixed point operator and a sound abstraction of , respectively.

Pruning Next we describe how we do pruning with the static analysis. Suppose we are given examples and a state with input () and output () variables. For each example , we first run the static analysis with the input and obtain the analysis result

 (itvs,ses)=(ˆC[[s]]([x↦α(vi)])(y).

We only consider the case when (when , the program is semantically ill-formed and therefore we just prune out the state). Then, we obtain the interval abstraction of the output , i.e., , and generate the constraints :

 Cs(vi,vo)=(ls≤lo∧uo≤us)∧(se∈SE⟹lo≤se≤uo).

The first (resp., second) conjunct means that the interval (resp., symbolic) analysis result must over-approximate the output example. We prune out a state iff is unsatisfiable for some example :

###### Definition 2.

The predicate is defined as follows:

 prune(s)⟺Cs(vi,vo) is unsatisfiable for some (vi,vo)∈E.

The unsatisfiability can be easily checked, for instance, with an off-the-shelf SMT solver. Our pruning is safe:

###### Theorem 1 (Safety).

.

That is, we prune out a state only when it is a failure state, which formally guarantees that the search algorithm with our pruning finds a solution if and only if the baseline algorithm (Section 4.2) does so.

## 5 Evaluation

Experimental setup To evaluate our synthesis algorithm, we gathered 30 introductory level problems from several online forums (Table 1).222E.g., http://www.codeforwin.in The problems consist of tasks manipulating integers and arrays. Some problems are non-trivial for novice students to solve; they require students to come up with various control structures such as nested loops and combinations of loops and conditional statements. The partial programs we used are similar to those shown in Section 2; they have one boolean expression hole , and one or two command holes . For each benchmark, we report the number of integer variables (IVars), array variables (AVars), integer constants (Ints), and examples (Exs) provided, respectively. All benchmark problems are publicly available with our tool. Experiments were conducted on MacBook Pro with Intel Core i7 and 16GB of memory.

Baseline Algorithm Table 1 shows the performance of our algorithm. The column “Base” shows the running time of our baseline algorithm that performs enumerative search without state normalization. In that case, the average runtime was longer than 616 seconds, and three of the benchmarks timed out ( 1 hour). The column “Base+Opt” reports the performance of the baseline with normalization. It shows that normalizing states succeeds to solve all benchmark problems and improves the speed by more than 3.7 times on average, although it degrades the speed for some cases due to runtime normalization overhead.

Pruning Effectiveness On top of “Base+Opt”, we applied our static-analysis-guided pruning technique (the column “Ours”). The results show that our pruning technique is remarkably effective. It reduces the average time to 6.6 seconds, improving the speed of “Base+Opt” by 25 times. Note that Simpl is able to synthesize the desired programs from a few examples (Exs), requiring up to 4 examples.

## 6 Related Work

Computer-aided education Recently, program synthesis technology has revolutionized computer-aided education. For instance, the technology has been used in automatic problem generation [Singh et al.2012, Ahmed et al.2013, Alvin et al.2014, Polozov et al.2015], automatic grading [Alur et al.2013], and automatic solution generation [Gulwani et al.2011].

Our work is to use program synthesis for automated programming education system. A large amount of work has been done to automate programming education [Adam and Laurent1980, Soloway et al.1981, Farrell et al.1984, Johnson and Soloway1984, Murray1989, Singh et al.2013, Gulwani et al.2014, Kaleeswaran et al.2016, Kim et al.2016], which focuses primarily on providing feedback on students’ programming submissions. Our system, Simpl, has the following advantages over prior works:

• Feedback on incomplete programs: Existing systems produce feedback only for complete programs; they cannot help students who do not know how to proceed further. In this case, Simpl can help by automatically generating solutions starting from incomplete solutions.

• No burden on instructor: Existing systems require instructor’s manual effort. For example, the system in [Singh et al.2013] needs a correct implementation and a set of correction rules manually designed by the instructor. On the other hand, Simpl does not require anything from the instructor.

An exception is [Farrell et al.1984], where an automatic LISP feedback system is presented. However, the system produces feedback by relying on ad-hoc rules.

Programming by example Our work differs from prior programming-by-example (PBE) techniques in two ways. First, to our knowledge, our work is the first to synthesize imperative programs with loops. Most of the PBE approaches focus on domain-specific languages for string transformation [Gulwani2011, Kini and Gulwani2015, Raza et al.2015, Manshadi et al.2013, Wu and Knoblock2015], number transformation [Singh and Gulwani2012], XML transformation [Raza et al.2014], and extracting relational data [Le and Gulwani2014], etc. Several others have studied synthesis of functional programs [Albarghouthi et al.2013, Osera and Zdancewic2015, Frankle et al.2016]. Second, our algorithm differs from prior work in that we combine semantic-based static analysis technology with enumerative program synthesis. Existing enumerative synthesis technology used pruning techniques such as type systems [Osera and Zdancewic2015, Frankle et al.2016] and deductions [Feser et al.2015], which are not applicable to our setting.

## 7 Conclusion

In this paper, we have shown that combining enumerative synthesis and static analysis is a promising way of synthesizing introductory imperative programs. The enumerative search allows us to find the smallest possible, therefore general, program while the semantics-based static analysis dramatically accelerates the process in a safe way. We demonstrated the effectiveness on 30 real programming problems gathered from online forums.

## References

• [Adam and Laurent1980] Anne Adam and Jean-Pierre Laurent. Laura, a system to debug student programs. Artificial Intelligence, 15(1-2), November 1980.
• [Ahmed et al.2013] Umair Z. Ahmed, Sumit Gulwani, and Amey Karkare. Automatically generating problems and solutions for natural deduction. In IJCAI, 2013.
• [Aho et al.1986] Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1986.
• [Albarghouthi et al.2013] Aws Albarghouthi, Sumit Gulwani, and Zachary Kincaid. Recursive program synthesis. In CAV, 2013.
• [Alur et al.2013] Rajeev Alur, Loris D’Antoni, Sumit Gulwani, Dileep Kini, and Mahesh Viswanathan. Automated grading of dfa constructions. In IJCAI, 2013.
• [Alvin et al.2014] Chris Alvin, Sumit Gulwani, Rupak Majumdar, and Supratik Mukhopadhyay. Synthesis of geometry proof problems. In AAAI, 2014.
• [Cousot and Cousot1977] Patrick Cousot and Radhia Cousot. Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints. In POPL, 1977.
• [Farrell et al.1984] Robert G. Farrell, John R. Anderson, and Brian J. Reiser. An interactive computer-based tutor for lisp. In AAAI, 1984.
• [Feser et al.2015] John K. Feser, Swarat Chaudhuri, and Isil Dillig. Synthesizing data structure transformations from input-output examples. In PLDI, 2015.
• [Frankle et al.2016] Jonathan Frankle, Peter-Michael Osera, David Walker, and Steve Zdancewic. Example-directed synthesis: A type-theoretic interpretation. In POPL, 2016.
• [Gulwani et al.2011] Sumit Gulwani, Vijay Anand Korthikanti, and Ashish Tiwari. Synthesizing geometry constructions. In PLDI, 2011.
• [Gulwani et al.2014] Sumit Gulwani, Ivan Radiček, and Florian Zuleger. Feedback generation for performance problems in introductory programming assignments. In FSE, 2014.
• [Gulwani2011] Sumit Gulwani. Automating string processing in spreadsheets using input-output examples. In POPL, 2011.
• [Johnson and Soloway1984] W. Lewis Johnson and Elliot Soloway. Proust: Knowledge-based program understanding. In ICSE, 1984.
• [Kaleeswaran et al.2016] Shalini Kaleeswaran, Anirudh Santhiar, Aditya Kanade, and Sumit Gulwani. Semi-supervised verified feedback generation. In FSE, 2016.
• [Kim et al.2016] Dohyeong Kim, Yonghwi Kwon, Peng Liu, I. Luk Kim, David Mitchel Perry, Xiangyu Zhang, and Gustavo Rodriguez-Rivera. Apex: Automatic programming assignment error explanation. In OOPSLA, 2016.
• [Kini and Gulwani2015] Dileep Kini and Sumit Gulwani. Flashnormalize: Programming by examples for text normalization. In IJCAI, 2015.
• [Le and Gulwani2014] Vu Le and Sumit Gulwani. Flashextract: A framework for data extraction by examples. In PLDI, 2014.
• [Manshadi et al.2013] Mehdi Manshadi, Daniel Gildea, and James Allen. Integrating programming by example and natural language programming. In AAAI, 2013.
• [Murray1989] William R. Murray. Automatic Program DeBugging for Intelligent Tutoring Systems. Morgan Kaufmann Publishers Inc., 1989.
• [Osera and Zdancewic2015] Peter-Michael Osera and Steve Zdancewic. Type-and-example-directed program synthesis. In PLDI, 2015.
• [Polozov et al.2015] Oleksandr Polozov, Eleanor O’Rourke, Adam M. Smith, Luke Zettlemoyer, Sumit Gulwani, and Zoran Popovic. Personalized mathematical word problem generation. In IJCAI, 2015.
• [Raza et al.2014] Mohammad Raza, Sumit Gulwani, and Natasa Milic-Frayling. Programming by example using least general generalizations. In AAAI, 2014.
• [Raza et al.2015] Mohammad Raza, Sumit Gulwani, and Natasa Milic-Frayling. Compositional program synthesis from natural language and examples. In IJCAI, 2015.
• [Singh and Gulwani2012] Rishabh Singh and Sumit Gulwani. Synthesizing number transformations from input-output examples. In CAV, 2012.
• [Singh et al.2012] Rohit Singh, Sumit Gulwani, and Sriram Rajamani. Automatically generating algebra problems. In AAAI, 2012.
• [Singh et al.2013] Rishabh Singh, Sumit Gulwani, and Armando Solar-Lezama. Automated feedback generation for introductory programming assignments. In PLDI, 2013.
• [Soloway et al.1981] Elliot M. Soloway, Beverly Woolf, Eric Rubin, and Paul Barth. Meno-ii: An intelligent tutoring system for novice programmers. In IJCAI. Morgan Kaufmann Publishers Inc., 1981.
• [Wu and Knoblock2015] Bo Wu and Craig A. Knoblock. An iterative approach to synthesize data transformation programs. In IJCAI, 2015.