Algebra-based Loop Synthesis

04/24/2020
by   Andreas Humenberger, et al.
Andrey Kusnetsov
0

We present an algorithm for synthesizing program loops satisfying a given polynomial loop invariant. The class of loops we consider can be modeled by a system of algebraic recurrence equations with constant coefficients. We turn the task of loop synthesis into a polynomial constraint problem by precisely characterizing the set of all loops satisfying the given invariant. We prove soundness of our approach, as well as its completeness with respect to an a priori fixed upper bound on the number of program variables. Our work has applications towards program verification, as well as generating number sequences from algebraic relations. We implemented our work in the Absynth tool and report on our initial experiments with loop synthesis.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 20

page 21

03/05/2021

Algebra-based Synthesis of Loops and their Invariants (Invited Paper)

Provably correct software is one of the key challenges in our softwaredr...
07/16/2021

Enhancing Loop-Invariant Synthesis via Reinforcement Learning

Loop-invariant synthesis is the basis of every program verification proc...
02/03/2020

Treating for-Loops as First-Class Citizens in Proofs

Indexed loop scopes have been shown to be a helpful tool in creating sou...
01/21/2019

Technical Report: Using Loop Scopes with for-Loops

Loop scopes have been shown to be a helpful tool in creating sound loop ...
10/27/2020

Deciding ω-Regular Properties on Linear Recurrence Sequences

We consider the problem of deciding ω-regular properties on infinite tra...
08/16/2018

Aligator.jl - A Julia Package for Loop Invariant Generation

We describe the Aligator.jl software package for automatically generatin...
04/25/2016

Generalized Homogeneous Polynomials for Efficient Template-Based Nonlinear Invariant Synthesis

The template-based method is one of the most successful approaches to al...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The classical setting of program synthesis has been to synthesize programs from proofs of logical specifications that relate the inputs and the outputs of the program [19]. This traditional view of program synthesis has been refined to the setting of syntax-guided synthesis (SyGuS) [2]. In addition to logical specifications, SyGuS approaches consider further constraints on the program template to be synthesized, thus limiting the search space of possible solutions [10, 13, 8, 20].

One of the main challenges in synthesis remains however to reason about program loops – for example by answering the question whether there exists a loop satisfying a given loop invariant and synthesizing a loop with respect to a given invariant. We refer to this task of synthesis as loop synthesis, which can be considered as the reverse problem of loop invariant generation: rather than generating invariants summarizing a given loop as in [22, 12, 16], we synthesize loops whose functional behavior is captured by a given invariant.

Motivating Example.

We motivate the use of loop synthesis by considering the program snippet of Figure 0(a). The loop in Figure 0(a) is a variant of one of the examples from the online tutorial111https://rise4fun.com/Dafny/ of the Dafny verification framework [18]: the given program is not partially correct with respect to the pre-condition and post-condition and the task is to revise/repair Figure 0(a) into a partially correct program using the invariant .

(a) Faulty loop
(b) Synthesized loop
(c) Synthesized loop
Figure 1: Program repair via loop synthesis. Figures 0(b) and 0(c) are revised versions of Figure 0(a) such that is an invariant of Figures 0(b)-0(c).

Our work introduces an algorithmic approach to loop synthesis by relying on algebraic recurrence equations and constraint solving over polynomials. In particular, using our approach we automatically synthesize Figures 0(b) and 0(c) by using the given non-linear polynomial equalities as input invariant to our loop synthesis task. While we do not synthesize loop guards, we note that we synthesize loops such that the given invariant holds for an arbitrary (and thus unbounded) number of loop iterations. Both synthesized programs, with the loop guard as in Figure 0(a), revise Figure 0(a) into a partially correct program with respect to the given requirements.

Algebra-based Loop Synthesis.

Following the SyGuS setting, we consider additional requirements on the loop to be synthesized: we impose syntactic requirements on the form of loop expressions and guards. The imposed requirements allow us to reduce the synthesis task to the problem of generating linear recurrences with constant coefficients, called C-finite recurrences [15]. As such, we define our loop synthesis task as follows:

[innertopmargin=1px, innerbottommargin=8px]problem

Problem (Loop Synthesis).

Given a polynomial over a set of variables, generate a loop with program variables such that

  1. is an invariant of , and

  2. each program variable in induces a C-finite number sequence.

Our approach to synthesis is conceptually different than other SyGuS-based methods, such as [10, 8, 20]: rather than iteratively refining both the input and the solution space of synthesized programs, we take polynomial relations describing a potentially infinite set of input values and precisely capture not just one loop, but the set of all loops (i) whose invariant is given by our input polynomial and (ii) whose variables induce C-finite number sequences. That is, any instance of this set yields a loop that is partially correct by construction. Figures 0(b) and 0(c) depict two solutions of our loop synthesis task for the invariant .

The main steps of our approach are as follows. (i) Let be a polynomial over variables and let be an upper bound on the number of program variables occurring in the loop. If not specified, is considered to be the number of variables from . (ii) We use syntactic constraints over the loop body to be synthesized and define a loop template, as given by our programming model (7). Our programming model imposes that the functional behavior of the synthesized loops can be modeled by a system of C-finite recurrences (Section 3). (iii) By using the invariant property of for the loops to the synthesized, we construct a polynomial constraint problem (PCP) characterizing the set of all loops satisfying (7) for which is a loop invariant (Section 4). Our approach combines symbolic computation techniques over algebraic recurrence equations with polynomial constraint solving. We prove that our approach to loop synthesis is both sound and complete. By completeness we mean, that if there is a loop with at most variables satisfying the invariant such that the loop body meets our C-finite syntactic requirements, then is synthesized by our method (Theorem 4.2). Moving beyond this a priori fixed bound , that is, deriving an upper bound on the number of program variables from the invariant, is an interesting but hard mathematical challenge, with connections to the inverse problem of difference Galois theory [25].

We finally note that our work is not restricted to specifications given by a single polynomial equality invariant. Rather, the invariant given as input to our synthesis approach can be conjunctions of polynomial equalities – as also shown in Figure 1.

Beyond Loop Synthesis.

Our work has potential applications beyond loop synthesis – such as in generating number sequences from algebraic relations and program optimizations.

  • Generating number sequences. Our approach provides a partial solution to an open mathematical problem: given a polynomial relation among number sequences, e.g.

    (1)

    synthesize algebraic recurrences defining these sequences. There exists no complete method for solving this challenge, but we give a complete approach in the C-finite setting parameterized by an a priori bound on the order of the recurrences. For the above given relation among and , our approach generates the C-finite recurrence equation which induces the Fibonacci sequence.

  • Program optimizations. Given a polynomial invariant, our approach generates a PCP such that any solution to this PCP yields a loop satisfying the given invariant. By using additional constraints encoding a cost function on the loops to be synthesized, our method can be extended to synthesize loops that are optimal with respect to the considered costs, for example synthesizing loops that use only addition in variable updates. Consider for example Figures 0(b)-0(c): the loop body of Figure 0(b) uses only addition, whereas Figure 0(c) implements also multiplications by constants.

Contributions.

In summary, this paper makes the following contributions.

  • We propose an automated procedure for synthesizing loops that are partially correct with respect to a given polynomial loop invariant (Section 4). By exploiting properties of C-finite sequences, we construct a PCP which precisely captures all solutions of our loop synthesis task. We are not aware of other approaches synthesizing loops from (non-linear) polynomial invariants.

  • We prove that our approach to loop synthesis is sound and complete (Theorem 4.2). That is, if there is a loop whose invariant is captured by our given specification, our approach synthesizes this loop. To this end, we consider completeness modulo an a priori fixed upper bound on the number of loop variables.

  • We implemented our approach in the new open-source framework

    Absynth. We evaluated our work on a number of academic examples and considered measures for handling the solution space of loops to be synthesized (Section 5).

2 Preliminaries

Let be a computable field with characteristic zero. We also assume to be algebraically closed, that is, every non-constant polynomial in has at least one root in . The algebraic closure of the field of rational numbers is such a field; is called the field of algebraic numbers.

Let denote the multivariate polynomial ring with variables . For a list , we write if the number of variables is known from the context or irrelevant. As is algebraically closed, every polynomial of degree has exactly roots. Therefore, the following theorem follows immediately: The zero polynomial is the only polynomial in having infinitely many roots.

2.1 Polynomial Constraint Problem (PCP)

A polynomial constraint is a constraint of the form where is a polynomial in and . A clause is then a disjunction of polynomial constraints. A unit clause is a special clause consisting of a single disjunct (i.e. ). A polynomial constraint problem (PCP) is then given by a set of clauses . We say that a variable assignment satisfies a polynomial constraint if holds. Furthermore, satisfies a clause if for some , is satisfied by . Finally, satisfies a clause set – and is therefore a solution of the PCP – if every clause within the set is satisfied by . We write to indicate that all polynomials in the clause set are contained in . For a matrix with entries we define the clause set to be .

2.2 Number Sequences and Recurrence Relations

A sequence is called C-finite if it satisfies a linear recurrence with constant coefficients, also known as C-finite recurrence [15]. Let and , then

(2)

is a C-finite recurrence of order . The order of a sequence is defined by the order of the recurrence it satisfies. We refer to a recurrence of order also as an -order recurrence, for example as a first-order recurrence when or a second-order recurrence when . A recurrence of order and initial values define a sequence, and different initial values lead to different sequences. For simplicity, we write for .

Let . The constant sequence satisfies a first-order recurrence equation with . The geometric sequence satisfies with . The sequence satisfies a second-order recurrence with and .∎

From the closure properties of C-finite sequences [15], the product and the sum of C-finite sequences are also C-finite. Moreover, we also have the following properties:

[[15]] Let and be C-finite sequences of order and , respectively. Then:

  1. is C-finite of order at most , and

  2. is C-finite of order at most .∎

[[15]] Let be pairwise distinct and . The sequence is the zero sequence if and only if the sequences are zero.∎

[[15]] Let . Then if and only if .∎

[[15]] Let be a sequence satisfying a C-finite recurrence of order . Then, for all if and only if for .∎

We define a system of C-finite recurrences of order and size to be of the form

where and . Every C-finite recurrence system can be transformed into a first-order system of recurrences by increasing the size such that we get

(3)

The closed form solution of a C-finite recurrence system (3) is determined by the roots of the characteristic polynomial of

, or equivalently by the eigenvalues

of . We recall that the characteristic polynomial of the matrix is defined as , where denotes the (matrix) determinant and

the identity matrix. Let

respectively denote the multiplicities of the roots of . The closed form of (3) is then given by

(4)

However, not every choice of the gives rise to a solution. For obtaining a solution, we substitute the general form (4) into the original system (3) and compare coefficients. The following example illustrates the procedure for computing closed form solutions.

The most well-known C-finite sequence is the Fibonacci sequence satisfying a recurrence of order which corresponds to the following first-order recurrence system:

(5)

The eigenvalues of are given by with multiplicities . Therefore, the general solution for the recurrence system is of the form

(6)

By substituting (6) into (5), we get the following constraints over the coefficients:

Bringing everything to one side yields:

For the above equation to hold, the coefficients of the have to be . That is, the following linear system determines and :

The solution space is generated by and . The solution space of the C-finite recurrence system hence consists of linear combinations of

That is, by solving the linear system

for with and , we get closed forms for (5):

Then represents the Fibonacci sequence starting at and starts at . Solving for and with symbolic and yields a parameterized closed form, where the entries of and are linear functions in the symbolic initial values.

3 Our Programming Model

Given a polynomial relation , our loop synthesis procedure generates a first-order C-finite recurrence system of the form (3) with , such that holds for all . It is not hard to argue that every first-order C-finite recurrence system corresponds to a loop with simultaneous variable assignments of the following form:

(7)

The program variables are numeric, are (symbolic) constants in and . For every loop variable , we denote by the value of at the th loop iteration. That is, we view loop variables as sequences .

We call a loop (7) parameterized if at least one of is symbolic, and non-parameterized otherwise.

While the output of our synthesis procedure is basically an affine program, we note that C-finite recurrence systems capture a larger class of programs. E.g. the program:

can be modeled by a C-finite recurrence system of order , which can be turned into an equivalent first-order system of size . That is, in order to synthesize a program which induces the sequences and we have to consider a recurrence system of size .∎

The recurrence system (5) in Example 2.2 corresponds to the following loop:

Algebraic relations and loop invariants.

Let be a polynomial in and let be number sequences. We call an algebraic relation for the given sequences if for all . Moreover, is an algebraic relation for a system of recurrences if it is an algebraic relation for the corresponding sequences. It is immediate that for every algebraic relation of a recurrence system, is a loop invariant for the corresponding loop (7); that is, holds before and after every loop iteration.

4 Algebra-based Loop Synthesis

We now present our approach for synthesizing loops satisfying a given polynomial property (invariant). We transform the loop synthesis problem into a PCP as described in Section 4.1. In Section 4.2, we introduce the clause sets of our PCP which precisely describe the solutions for the synthesis of loops, in particular to non-parameterized loops. We extend this approach in Section 4.3 to parameterized loops.

4.1 Setting and Overview of Our Method

Given a constraint with , we aim to synthesize a system of C-finite recurrences such that is an algebraic relation thereof. Intuitively, the values of loop variables are described by the number sequences for arbitrary , and correspond to the initial values . That is, we have a polynomial relation among loop variables and their initial values , for which we synthesize a loop (7) such that is a loop invariant of loop (7).

Our approach is not limited to invariants describing the relationship between program variables among a single loop iteration. Instead, it naturally extends to relations among different loop iterations. For instance, by considering the relation in equation (1), we synthesize a loop computing the Fibonacci sequence.

The key step in our work comes with precisely capturing the solution space for our loop synthesis problem as a PCP. Our PCP is divided into the clause sets , , and , as illustrated in Figure 2 and explained next. Our PCP implicitly describes a first-order C-finite recurrence system and its corresponding closed form system. The one-to-one correspondence between these two systems is captured by the clause sets , and . Intuitively, these constraints mimic the procedure for computing the closed form of a recurrence system (see [15]). The clause set interacts between the closed form system and the polynomial constraint , and ensures that is an algebraic relation of the system. Furthermore, the recurrence system is represented by the matrix

and the vector

of initial values where both consist of symbolic entries. Then a solution of our PCP – which assigns values to those symbolic entries – yields a desired synthesized loop.

In what follows we only consider a unit constraint as input to our loop synthesis procedure. However, our approach naturally extends to conjunctions of polynomial equality constraints.

Closed form system

Polynomial invariant

Recurrence system

Loop

,

Figure 2: Overview of the PCP describing loop synthesis

4.2 Synthesizing Non-Parameterized Loops

We now present our work for synthesizing loops, in particular non-parameterized loops (7). That is, we aim at computing concrete initial values for all program variables. Our implicit representation of the recurrence system is thus of the form

(8)

where is invertible and , both containing symbolic entries.

As described in Section 2.2, the closed form of (8) is determined by the eigenvalues of which we thus need to synthesize. Note that may contain both symbolic and concrete values. Let us denote the symbolic entries of by . Since is algebraically closed we know that has (not necessarily distinct) eigenvalues. We therefore fix a set of distinct symbolic eigenvalues together with their multiplicities with for such that . We call an integer partition of . We next define the clause sets of our PCP.

Root constraints .

The clause set imposes that is invertible and ensures that are distinct symbolic eigenvalues with multiplicities . Note that is invertible if and only if all eigenvalues are non-zero. Furthermore, since is algebraically closed, every polynomial can be written as the product of linear factors of the form , with , such that . Therefore, the equation

holds for all , where . Bringing everything to one side, we get

implying that the have to be zero. The clause set characterizing the eigenvalues of is then

Coefficient constraints .

The fixed symbolic roots/eigenvalues with multiplicities induce the general closed form solution

(9)

where the are column vectors containing symbolic entries. As stated in Section 2.2, not every choice of the gives rise to a valid solution. Instead, have to obey certain conditions which are determined by substituting into the original recurrence system of (8):

Bringing everything to one side yields and thus

(10)

Equation (10) holds for all . By Theorem 2.2 we then have for all and define

Initial values constraints .

The constraints describe properties of initial values . We enforce that (9) equals , for , where is the degree of the characteristic polynomial of , by

where , with as in (8) and being the right-hand side of (9) where is replaced by .

Algebraic relation constraints .

The constraints are defined to ensure that is an algebraic relation among the . Using (9), the closed forms of the are expressed as

where the are polynomials in . By substituting the closed forms and the initial values into the polynomial , we get

(11)

where the are of the form

(12)

with and being monomials in .

Let be of the form (11). Then if and only if for . ∎

Proof.

One direction is obvious and for the other assume . By rearranging we get . Let be such that with . Note that the are not necessarily distinct. However, consider to be the pairwise distinct elements of the . Then we can write as . By Theorems 2.2 and 2.2 we get that the have to be . Therefore, also for all . Then, for each , we have . ∎

As is an algebraic relation, we have that should be for all . Proposition 4.2 then implies that the have to be for all . Let be of the form (12). Then for all if and only if for . ∎

Proof.

The proof follows from Theorem 2.2 and from the fact that satisfies a C-finite recurrence of order . To be more precise, the and satisfy a first-order C-finite recurrence: as is constant it satisfies a recurrence of the form , and satisfies . Then, by Theorem 2.2 we get that is C-finite of order at most , and is C-finite of order at most . ∎

Even though the contain exponential terms in , it follows from Lemma 4.2 that the solutions for the being for all can be described as a finite set of polynomial equality constraints: Let denote the polynomial constraint for of the form (12), and let be the associated clause set. Then the clause set ensuring that is indeed an algebraic relation is given by

Observe that Theorem 2.2 can be applied to (11) directly, as satisfies a C-finite recurrence. Then by the closure properties of C-finite recurrences, the upper bound on the order of the recurrence which satisfies is given by . That is, by Theorem 2.2, we would need to consider with , which yields a non-linear system with a degree of at least . Note that depends on , which stems from the fact that satisfies a recurrence of order , and satisfies therefore a recurrence of order at most . Thankfully, Proposition 4.2 allows us to only consider the coefficients of the and therefore lower the size of our constraints.∎

Having defined the clause sets , , and , we define our PCP as the union of these four clause sets. Note that the matrix , the vector , the polynomial and the multiplicities of the symbolic roots uniquely define the clauses discussed above. We hence define our PCP to be the clause set as follows:

(13)

Recall that and are the symbolic entries in the matrices and in (8), are the symbolic entries in the in (9), and are the symbolic eigenvalues of . We then have , , and . Hence .

It is not difficult to see that the constraints in determine the size of our PCP. As such, the degree and the number of terms in the invariant have a direct impact on the size and the maximum degree of the polynomials in our PCP. Which might not be obvious is that the number of distinct symbolic roots influences the size and the maximum degree of our PCP. The more distinct roots are considered the higher is the number of terms in (12), and therefore more instances of (12) have to be added to our PCP.

Let , and , and let be an integer partition of . We then get the following theorem:

The mapping is a solution of if and only if is an algebraic relation for with , and the eigenvalues of are given by with multiplicities .∎

From Theorem 4.2, we then get Algorithm 1 for synthesizing the C-finite recurrence representation of a non-parameterized loop (7): the function returns the set of all integer partitions of an integer ; and returns whether the clause set is satisfiable and a model if so. We note that the growth of the number of integer partitions is subexponential, and so is the complexity Algorithm 1. A more precise complexity analysis of Algorithm 1 is an interesting future work.

Input : A polynomial .
Output : A vector and a matrix s.t.  is an algebraic relation of and , if such and exist.
// symbolic vector
// symbolic matrix
for  do
       if sat then  return
end for
Algorithm 1 Synthesis of a non-parameterized C-finite recurrence system

Finally, based on Theorem 4.2 and on the property that the number of integer partitions of a given integer is finite, we obtain the following result:

Algorithm 1 is sound, and complete w.r.t. recurrence systems of size .∎

The completeness in Theorem 4.2 is relative to systems of size which is a consequence of the fact that we synthesize first-order recurrence systems. That is, there exists a recurrence system of order  and size with an algebraic relation , but there exists no first-order system of size where is an algebraic relation.

The precise characterization of non-parameterized loops by non-parameterized C-finite recurrence systems implies soundness and completeness for non-parameterized loops from Theorem 4.2.

We showcase our procedure in Algorithm 1 by synthesizing a loop for the invariant . That is, the polynomial constraint is given by and we want to find a recurrence system of the following form:

(14)

The characteristic polynomial of is then given by where its roots define the closed form system. Since we cannot determine the actual roots of we have to fix a set of symbolic roots. The characteristic polynomial has two – not necessarily distinct – roots: Either has two distinct roots with multiplicities , or a single root with multiplicity . Let us consider the latter case. The first clause set we define is for ensuring that is invertible (i.e.  is nonzero), and that is indeed a root of the characteristic polynomial with multiplicity . That is, has to hold for all , and bringing everything to one side yields

We then get the following clause set:

As we fixed the symbolic roots, the general closed form system is of the form

(15)

By substituting into the recurrence system we get: