Templates and Recurrences: Better Together

03/30/2020 ∙ by Jason Breck, et al. ∙ University of Wisconsin-Madison Princeton University 0

This paper is the confluence of two streams of ideas in the literature on generating numerical invariants, namely: (1) template-based methods, and (2) recurrence-based methods. A template-based method begins with a template that contains unknown quantities, and finds invariants that match the template by extracting and solving constraints on the unknowns. A disadvantage of template-based methods is that they require fixing the set of terms that may appear in an invariant in advance. This disadvantage is particularly prominent for non-linear invariant generation, because the user must supply maximum degrees on polynomials, bases for exponents, etc. On the other hand, recurrence-based methods are able to find sophisticated non-linear mathematical relations, including polynomials, exponentials, and logarithms, because such relations arise as the solutions to recurrences. However, a disadvantage of past recurrence-based invariant-generation methods is that they are primarily loop-based analyses: they use recurrences to relate the pre-state and post-state of a loop, so it is not obvious how to apply them to a recursive procedure, especially if the procedure is non-linearly recursive (e.g., a tree-traversal algorithm). In this paper, we combine these two approaches and obtain a technique that uses templates in which the unknowns are functions rather than numbers, and the constraints on the unknowns are recurrences. The technique synthesizes invariants involving polynomials, exponentials, and logarithms, even in the presence of arbitrary control-flow, including any combination of loops, branches, and (possibly non-linear) recursion. For instance, it is able to show that (i) the time taken by merge-sort is O(n log(n)), and (ii) the time taken by Strassen's algorithm is O(n^log_2(7)).

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

A large body of work within the numerical-invariant-generation literature focuses on template-based methods (Colón et al., 2003; Sankaranarayanan et al., 2004a). Such methods fix the form of the invariants that can be discovered, by specifying a template that contains unknown quantities. Given a program and some property to be proved, a template-based analyzer proceeds by finding constraints on the values of the unknowns and then solving these constraints to obtain invariants of the program that suffice to prove the property. Template-based methods have been particularly successful for finding invariants within the domain of linear arithmetic.

Many programs have important numerical invariants that involve non-linear mathematical relationships, such as polynomials, exponentials, and logarithms. A disadvantage of template-based methods for non-linear invariant generation is that (in contrast to the linear case) there is no “most general” template term, so the user must supply the set of terms that may appear in the invariant.

In this paper, we present an invariant-synthesis technique that is related to template-based methods, but sidesteps the above difficulty. Our technique is based on a concept that we call a hypothetical summary, which is a template for a procedure summary in which the unknowns are functions, rather than numbers. The constraints that we extract for these functions are recurrences. Solving these recurrence constraints allows us to synthesize terms over program variables that we can substitute in place of the unknown functions in our template and thereby obtain procedure summaries.

Whereas most template-based methods directly constrain the mathematical form of their invariants, our technique constrains the invariants indirectly, by way of recurrences, and thereby allows the invariants to have a wide variety of mathematical forms involving polynomials, exponentials, and logarithms. This aspect is intuitively illustrated by the recurrences and : although these two recurrences are outwardly similar, their solutions are more different than one would expect at first glance, in that is , whereas is . Because the unknowns in our templates are functions, we can generate a wide variety of invariants (involving polynomials, exponentials, logarithms) without specifying their exact syntactic form.

However, recurrence-based invariant-generation techniques typically have disadvantages when applied to recursive programs. Recurrences are well-suited to characterize the sequence of states that occur as a loop executes. This idea can be extended to handle linear recursion—where a recursive procedure makes only a single recursive call: each procedure-entry state that occurs “on the way down” to the base case of the recursion is paired with the corresponding procedure-exit state that occurs “on the way back up” from the base case, and then recurrences are used to describe the sequence of such state pairs. However, non-linear recursion has a different structure: it is tree-shaped, rather than linear, and thus some kind of additional abstraction is required before non-linear recursion can be described using recurrences.

We use the technique of hypothetical summaries to extend the work of (Farzan and Kincaid, 2015), (Kincaid et al., 2018), and (Kincaid et al., 2017): hypothetical summaries enable a different approach to the analysis of non-linearly recursive programs, such as divide-and-conquer or tree-traversal algorithms.111 Warning: We use the term “non-linear” in two different senses: non-linear recursion and non-linear arithmetic. Even for a loop that uses linear arithmetic, non-linear arithmetic may be required to state a loop invariant. Moreover, arithmetic expressions in the programs that we analyze are not limited to linear arithmetic: variables can be multiplied.    The two uses of the term “non-linear” are essentially unrelated, and which term is intended should be clear from context. The paper primarily concerns new techniques for handling non-linear recursion, and non-linear arithmetic is handled by known methods, e.g., (Kincaid et al., 2018). We show how to analyze the base case of a procedure to extract a template for a procedure summary (i.e., a hypothetical summary). By assuming that every call to the procedure, throughout the tree of recursive calls, is consistent with the template, we discover relationships (i.e., recurrence constraints) among the states of the program at different heights in the tree. We then solve the constraints and fill in the template to obtain a procedure summary. Hypothetical summaries thus provide the additional layer of abstraction that is required to apply recurrence-based invariant generation to non-linearly recursive procedures.

Our invariant generation procedure is both (1) general-purpose, so it is applicable to a wide variety of tasks, and (2) compositional, so the space and time required to analyze a program fragment depends on the size of the fragment rather than the whole program. In contrast, conventional template-based methods are goal-directed (they must be tailored to a specific problem of interest, e.g., a template-based invariant generator for verification problems cannot solve quantitative problems such as resource-bound analysis) and whole-program. The general-purpose nature of our procedure also distinguishes it from recurrence-based resource-bound analyses, which for example cannot be applied to assertion checking.

To evaluate the applicability of our analysis to challenging numerical-invariant-synthesis tasks, we applied it to the task of generating bounds on the computational complexity of non-linearly recursive programs and the task of generating invariants that suffice to prove assertions. Our experiments show that the analysis technique is able to prove properties that (Kincaid et al., 2017) was not capable of proving, and is competitive with the output of state-of-the-art assertion-checking and resource-bound-analysis tools.

Contributions. Our work makes contributions in three main areas:

  1. We introduce an analysis method based on “hypothetical summaries.” It hypothesizes that a summary exists of a particular form, using uninterpreted function symbols to stand for unknown expressions. Analysis is performed to obtain constraints on the function symbols, which are then solved to obtain a summary.

  2. We develop a procedure-summarization technique called height-based recurrence analysis, which uses the notion of hypothetical summaries to produce bounds on the values of program variables based on the height of recursion (Height). We further develop algorithms that, when used in conjunction with height-based recurrence analysis (DepthBoundDualHeight), yield more precise summaries. Furthermore, we give an algorithm (Mutual) that generalizes height-based recurrence analysis to the setting of mutual recursion.

  3. The technique is implemented in the CHORA tool. Our experiments show that CHORA is able to handle many non-linearly recursive programs, and generate invariants that include exponentials, polynomials, and logarithms (Experiments). For instance, it is able to show that (i) the time taken by merge-sort is , (ii) the time taken by Strassen’s algorithm is , and (iii) an iterative function and a non-linearly recursive function that both perform exponentiation are functionally equivalent.

Overview presents an example to provide intuition. Background provides background on material needed for understanding the paper’s results. Related discusses related work.

2. Overview

Figure 1. Example program subsetSum. The diagram at the bottom shows a timeline of a height execution of subsetSumAux. is related to the increase of nTicks between the pre-state (label 1) and the post-state (label 6). is related to the increase of nTicks between (2) and (3) and also between (4) and (5), i.e., between the pre-states and post-states of height- executions.

The goal of this paper is to find numerical summaries for all the procedures in a given program. For simplicity, this section discusses the analysis of a program that contains a single procedure , which is non-linearly recursive and calls no other procedures.

We use the following example to illustrate how our techniques use recurrence solving to summarize non-linearly-recursive procedures.

Example 2.1 ().

The function subsetSum (subsetSum) takes an array A of n integers, and performs a brute-force search to determine whether any non-empty subset of A’s elements sums to zero. If it finds such a set, it returns the number of elements in the set, and otherwise it returns zero. The recursive function subsetSumAux works by sweeping through the array from left to right, making two recursive calls for each array element. The first call considers subsets that include the element A[i], and the second call considers subsets that exclude A[i]. The sum of the values in each subset is computed in the accumulating parameter sum. When the base case is reached, subsetSumAux checks whether sum is zero, and if so, sets found to true. At each of the two recursive call sites, the value returned by the recursive call is stored in the variable size. After found is set to true, subsetSumAux computes the size of the subset by returning if the subset was found after the first recursive call, or returning size unchanged if the subset was found after the second recursive call.

In this paper, a state of a program is an assignment of integers to program variables. For each procedure , we wish to characterize the relational semantics , defined as the set of state pairs such that can start executing in state and finish in state . To find an over-approximate representation of the relational semantics of a recursive procedure such as subsetSumAux, we take an approach that we call height-based recurrence analysis. In height-based recurrence analysis, we construct and solve recurrence relations to discover properties of the transition relation of a recursive procedure. To formalize our use of recurrence relations, we give the following definitions.

We define the height-bounded relational semantics to be the subset of that can achieve if it is limited to using an execution stack with a height of at most activation records. We define a height- execution of to be any execution of that uses a stack height of at most , or, in other words, an execution of having recursion depth no more than . Base cases are defined to be of height 1. Let be a set of polynomials over unprimed and primed program variables, representing the pre-state and post-state of , respectively. For each we associate a function , such that is defined to be the set of values such that, for some , evaluates to by using and to interpret the unprimed and primed variables, respectively.

Using subsetSumAux as an example, let . Then, denotes the set of values can take on in any base case of subsetSumAux. In this program, is 0 in any base case, and so . Now consider an execution of height 2. In the case that found is true, we have that increases by 1 compared to the value that has in the base case. If found is not true then remains the same. In other words, at height-2 executions, takes on the values 0 and 1; i.e., . Similarly, , and so on. We approximate the value set by finding a function that bounds for all ; that is, for any , we have . In the case of , a suitable bounding function is . The initial step of our analysis chooses terms , and then for each term , tries to synthesize a function that bounds the set of values can take on.

Note that for a given term , a corresponding bounding function may not exist. A necessary condition for a bounding function to exist for a term is that the set must be bounded. This observation restricts our set of candidate terms to only be over terms that are bounded above in the base case. (Specifically, we require the expressions to be bounded above by zero.) For example, in the base case, and so is a candidate term. Similarly, the term is also bounded above by 0 in the base case, and so is a candidate term. There are other candidate terms that our analysis would extract for this example, but for brevity they are not listed here. We discover these bounded terms and using symbolic abstraction (see Background).

Once we have a set of candidate terms , we seek to find corresponding bounding functions . Note that such functions may not exist: just because is bounded above in the base case does not mean it is bounded in all other executions. If a bounding function for a term does exist, we would like a closed-form expression for it in terms of . We derive such closed-form expressions by hypothesizing that a bounding function does exist. These hypothetical functions allow us to construct a hypothetical procedure summary that represents a typical height- execution. For example, in the case of subsetSumAux:

Note that, although assumes the existence of several bounding functions (corresponding to for several values of ), the assumptions for different values of need not all succeed or fail together. That is, if we fail to find a bounding function for some , this failure does not prevent us from continuing the analysis and finding other bounding functions (, with ) for the same procedure.

We then build up a height- summary, , compositionally, with replacing the recursive calls. For example, consider the term in the context of subsetSum. Our goal is to create a relational summary for the variable nTicks between labels 1 and 6. We do this by extending a summary for the transition between labels 1 and 2 with a summary for the transition between 2 and 3, namely, our hypothetical summary. Then we extend that with a summary for the paths between labels 3 and 4, and so on. Between labels 1 and 2, nTicks gets increased by 1. We then summarize the transition between 1 and 3. We know nTicks gets increased by 1 between labels 1 and 2. Furthermore, our hypothetical bounding function says that nTicks gets increased by at most between labels 2 and 3. Combining these summaries, we see that nTicks gets increased by at most between labels 1 and 3. nTicks does not change between labels 3 and 4, so the summary between labels 1 and 4 is the same as the one between labels 1 and 3. The transition between labels 4 and 5 is a recursive call, so we again use our hypothetical summary to approximate this transition. Once again, such a summary says nTicks gets increased by at most . Extending our summary for the transition between 1 and 4 with this information allows us to conclude that nTicks gets increased by at most between labels 1 and 5. nTicks does not change between labels 5 and 6. Consequently, our summary for nTicks between labels 1 and 6 is . Similar reasoning would also obtain a summary for return as . These formulas constitute our height- hypothetical summary, .

If we rearrange each conjunct to respectively place and on the left-hand-side of each inequality, we obtain height- bounds on the values of and . By definition such bounds are valid expressions for and . That is at height-,

(1)
(2)

The equations give recursive definitions for and . Solving these recurrence relations give us bounds on the value sets and , for all heights .

In DepthBound, we present an algorithm that determines an upper bound on a procedure’s depth of recursion as a function of the parameters to the initial call and the values of global variables. This depth of recursion can also be interpreted as a stack height that we can use as an argument to the bounding functions . In the case of subsetSumAux, we obtain the bound . The solutions to the recurrences discussed above, when combined with the depth bound, yield the following summary.

When subsetSum is called with some array size , the maximum possible depth of recursion that can be reached by subsetSumAux is equal to . In this way, we have established that the running time of subsetSum is exponential in , and the return value is at most .

3. Background

Relational semantics.

In the following, we give an abstract presentation of the relational semantics of programs. Fix a set Var of program variables. A state consist of an integer valuation for each program variable. A recursive procedure can be understood as a chain-continuous (and hence monotonic) function on state relations . The relational semantics of is given as the limit of the ascending Kleene chain of :

Operationally, for any we may view as the input/output relation of on a machine with a stack limit of activation records. We can extend relational semantics to mutually recursive procedures in the natural way, by considering to be function that takes as input a -tuple of state relations (where is the number of mutually recursive procedures).

A transition formula is a formula over the program variables and an additional set of “primed” copies, representing the values of the program variables before and after a computation. A transition relation can be interpreted as a property that holds of a pair of states : we say that satisfies if is true when each variable in is interpreted according to , and each variable in is interpreted according to . We use to denote the state relation consisting of all pairs that satisfy . This paper is concerned with the problem of procedure summarization, in which the goal is to find a transition formula that over-approximates a procedure, in the sense that .

A relational expression is a polynomial over with rational coefficients. A relational expression can be evaluated at a state pair by using to interpret the unprimed symbols and to interpret the primed symbols—we use to denote the evaluation of at .

Intra-procedural analysis.

The technique for procedure summarization developed in this paper makes use of intra-procedural summarization as a sub-routine. We formalize this intra-procedural technique by a function , which takes as input a control-flow graph with vertices , edges , entry vertex , and exit vertex , and computes a transition formula that over-approximates all paths in between and . We use to denote a function that takes as input a recursive procedure and a transition formula , and computes a transition formula that over-approximates when is used to interpret recursive calls (i.e., ). can be implemented in terms of by replacing all call edges with , and taking to be the control-flow graph of .

In principle, any intra-procedural summarization procedure can be used to implement ; the implementation of our method uses the technique from Kincaid et al. (2018).

Input : Formula of the form where is satisfiable and quantifier-free
Output : Convex hull of
1 ;
2 while there exists a model of  do
3    Let be a cube of the DNF of s.t. ;
    ;
     /* Polyhedral projection */
    ;
     /* Polyhedral join */
4    ;
5   
return
Algorithm 1 The convex-hull algorithm from (Farzan and Kincaid, 2015)

Symbolic abstraction.

We use to denote a procedure that takes a formula and computes a set of polynomial inequations over the variables that are implied by . If is expressed in linear arithmetic, then a representation of all implied polynomial inequations (namely, a constraint representation of the convex hull of projected onto ) can be computed effectively (e.g., using (Farzan and Kincaid, 2015, Alg. 2), which we show in this paper as PolyhedralAlphaHat). Otherwise, we settle for a sound procedure that produces inequations implied by , but not necessarily all of them (e.g., using (Kincaid et al., 2018, Alg. 3)).

In principle, the convex hull of a linear arithmetic formula F can be computed as follows: write F in disjunctive normal form, as , where each is a conjunction of linear inequations (i.e., a convex polyhedron). The convex hull of is obtained by replacing disjunctions with the join operator of the domain of convex polyhedra. This algorithm can be improved by using an SMT solver to enumerate the DNF lazily, and extended to handle existential quantification by using polyhedral projection (PolyhedralAlphaHat). A similar approach can be used to compute a conjunction of non-linear inequations that are implied by a formula , by treating non-linear terms in the formula as additional dimensions of the space (e.g., a quadratic inequation is treated as a linear inequation , where and are symbols that we associate with the terms and , but have no intrinsic meaning). The non-linear variation of the algorithm’s precision can be improved by using inference rules, congruence closure, and Grobner-basis algorithms to deduce linear relations among the non-linear dimensions that are consequences of the non-linear theory ((Kincaid et al., 2018, Alg. 3)). Note that, because non-linear integer arithmetic is undecidable, this process is (necessarily) incomplete.

Recurrence relations.

C-finite sequences are a well-studied class of sequences defined by linear recurrence relations, of which a famous example is the Fibonacci sequence. Formally,

Definition 3.1 ().

A sequence is -finite of order if it satisfies a linear recurrence equation

where each is a constant.

It is classically known that every C-finite sequence admits a closed form that is computable from its recurrence relation and takes the form of an exponential-polynomial

where each is a polynomial in and each is a constant. In the following, it will be convenient to use a different kind of recurrence relation to present -finite sequences, namely stratified systems of polynomial recurrences.

Definition 3.2 ().

A stratified system of polynomial recurrences is a system of recurrence equations over sequences of the form

where each is a constant, and is a polynomial in .

Intuitively, the sequences are organized into strata ( is the first, is the second, and so on), the right-hand-side of the equation for can involve linear terms over the sequences in the strata, and additional polynomial terms over sequences of lower strata. It follows from the closure properties of C-finite sequences that each defines a -finite sequence, and an exponential-polynomial closed form for each sequence can be computed from a stratified system of polynomial recurrences (Kauers and Paule, 2011). The fact that any -finite sequence satisfies a stratified system of polynomial recurrences follows from the fact that a recurrence of order can be implemented as a system of linear recurrences among sequences (Kauers and Paule, 2011).

Example 3.3 ().

An example of a stratified system of polynomial recurrences with four sequences () arranged into two strata ( and ) is as follows:

This system has the closed-form solution

4. Technical Details

This section gives algorithms for summarizing recursive procedures using recurrence solving. We assume that before these algorithms are applied to the procedures of a program , we first compute and collapse the strongly connected components of the call graph of and topologically sort the collapsed graph. Our analysis then works on the strongly connected components of the call graph in a single pass, in a topological order of the collapsed graph, by applying the algorithms of this section to recursive components, and applying intraprocedural analysis to non-recursive components.

For simplicity, Height focuses on the analysis of strongly connected components consisting of a single recursive procedure . The first step of the analysis is to apply candidates, which produces a set of inequations that describe the values of variables in . Not all of the inequations found by candidates are suitable for use in a recurrence-based analysis, so we apply filtering to filter the set of inequations down to a subset that, when combined, form a stratified recurrence. The next step is to give this recurrence to a recurrence solver, which results in a logical formula relating the values of variables in to the stack height that may be used by . In DepthBound, we show how to (i) obtain a bound on that depends on the program state before the initial call to , and (ii) combine the recurrence solution with that depth bound to create a summary of . In DualHeight, we discuss how to obtain a certain class of more precise bounds (including lower bounds on the running time of a procedure). In Mutual, we show how to extend the techniques of Height to handle programs with mutual recursion, i.e., programs whose call graphs have strongly connected components consisting of multiple procedures. In MissingBaseCases, we discuss an extension of the algorithm of Mutual that handles sets of mutually recursive procedures in which some procedures do not have base cases.

4.1. Height-Based Recurrence Analysis

Let be a relational expression and let be a procedure. We use to denote the set of values of in a height- execution of .

It consists of values to which may evaluate at a state pair belonging to . We call a bounding function for in if for all and all , we have . Intuitively, the bounding function bounds the value of an expression in any execution that uses stack height at most .

The goal of Height is to find a set of relational expressions and associated bounding functions. We proceed in three steps. First, we determine a set of candidate relational expressions . Second, we optimistically assume that there exist functions that bound these expressions, and we analyze under that assumption to obtain constraints relating the values of the relational expressions to the values of the functions. Third, we re-arrange the constraints into recurrence relations for each of the functions (if possible) and solve them to synthesize a closed-form expression for that is suitable to be used in a summary for .

We begin our analysis of by determining a set of suitable expressions . If a relational expression has an associated bounding function, then it must be the case that (i.e., the set of values that takes on in the base case) is bounded above. Without loss of generality, we choose expressions so that is bounded above by zero. (Note that if is bounded above by then is bounded above by zero.) We begin our analysis of by analyzing the base case to look for relational expressions that have this property.

Input : A procedure , and the associated vocabulary of program variables Var
Output : Height-based-recurrence summary
1 ;
2 ;
3 the number of inequations in ;
4 foreach  in  do
5    Let be the expression over such that the inequation in is ;
6    Let be a fresh uninterpreted function symbol ;
7   
8    ;
9    ;
10    ;
11    ;
12    foreach  in  do
13       ;
14       foreach inequation in  do
15         
16         
return
Algorithm 2 Algorithm for extracting candidate recurrence inequations

Selecting candidate relational expressions.

The reason for looking at expressions over program variables, as opposed to individual variables, is illustrated by SubsetSum: the variable nTicks has a different value each time the base case executes, but the expression is always equal to zero in the base case.

With the goal of identifying relational expressions that are bounded above by zero, candidates begins by extracting a transition formula for the non-recursive paths through by calling (i.e., summarizing by using false as a summary for the recursive calls in ). Next, we compute a set of polynomial inequations over (the set of un-primed (pre-state) and primed (post-state) copies of all global variables, along with unprimed copies of the parameters to and the variable , which represents the return value of ) that are implied by by calling . Let be the number of inequations in . Then, for , we rewrite the inequation in the form . In the case of SubsetSum, and have the property that and in the base case.

Note that there are, in general, many sets of relational expressions that are bounded above by zero in the base case. The soundness of candidates only depends on Abstract choosing some such set. Our implementation of Abstract uses (Kincaid et al., 2018, Alg. 3), and is not guaranteed to choose the set of relational expressions that would lead to the most precise results for any given application, e.g., for a given assertion-checking or complexity-analysis problem. Intuitively, in the case that is a formula in linear arithmetic, our implementation of Abstract amounts to using the operations of the polyhedral abstract domain to find a convex hull of . Then, each of the inequations in the constraint representation of the convex hull can be interpreted as a relational expression that is bounded above by zero in the base case.

Generating constraints on bounding functions.

For each of the expressions that has an upper bound in the base case, we are ultimately looking to find a function that is an upper bound on the value of that expression in any height- execution. Our way of finding such a function is to analyze the recursive cases of to look for an invariant inequation that gives an upper bound on in terms of an upper bound on . Such an inequation can be interpreted as a recurrence relating to .

The remainder of candidates (SummaryCallLastLine) finds such invariant inequations. The first step is to create the hypothetical procedure summary , which hypothesizes that a bounding function exists for each expression , and that the value of that function at height is an upper bound on the value of . is a transition formula that represents a height- execution of . In SubsetSum, is:

On SummaryRec, candidates calls Summary, using as the representation of each recursive call in , and the resulting transition formula is stored in . Thus, describes a typical height- execution of . In SubsetSum, a simplified version of is given as in Overview.

On SummaryExt, the formula is produced by conjoining with a formula stating that, for each , . Therefore, implies that any upper bound on must be an upper bound on in any height- execution.

Ultimately, we wish to obtain a closed-form solution for each . The formula implicitly determines a set of recurrences relating to . However, does not have the explicit form of a recurrence. WedgeExtractLastLine abstract to a conjunction of inequations that give an explicit relationship between and for each .

Input : A set of candidate inequations over the function symbols
Output : A set of inequations that form a stratified recurrence
1 Let be a map from integers to integers;
2 Let and be maps that map all pairs of integers to false;
3 ;
4 foreach  in  do
5    Write as if can be written in that form with , , , ; otherwise let and continue loop;
6    For , ;
7    Let be ;
8    ;
9    foreach  do
10       foreach  do
11          if  then  ;
12          if  then  ;
13         
14         
15         
16          ;
17          repeat
18             ;
19             repeat
20                foreach  do
21                   if   then ;
22                   if   then ;
23                  
24                  
25                   until  is unchanged;
26                  foreach  do
27                      if  contains more than one such that  then
28                         Arbitrarily choose one such to remain in , and remove all other such from ;
29                        
30                         ;
31                        
32                         until ;
return
Algorithm 3 Algorithm for constructing a stratified recurrence

Extracting and solving recurrences.

The next step of height-based recurrence analysis is to identify a subset of the inequations returned by candidates that constitute a stratified system of polynomial recurrences (solvable). This subset must meet the following three stratification criteria:

  1. Each bounding function must appear on the left-hand-side of at most one inequation.

  2. If a bounding function appears on the right-hand-side of an inequation, then appears on some left-hand-side.

  3. It must be possible to organize the into strata, so that if appears in a non-linear term on the right-hand-side of the inequation for , then must be on a strictly lower stratum than .

filtering computes a maximal subset of inequations that complies with the above three rules.

The next step of height-based recurrence analysis is to send this recurrence to a recurrence solver, such as the one described in Kincaid et al. (2018). The solution to the recurrence is a set of bounding functions. Let be the set of indices such that we found a recurrence for, and obtained a closed-form solution to, the bounding function . Using these bounding functions, we can derive the following procedure summary for , which leaves the height unconstrained.

(3)

The subject of DepthBound is to find a formula relating to the pre-state of the initial call to . The formula can be combined with (LABEL:NoHeightSummary) to obtain a more precise procedure summary.

Soundness.

Roughly, the soundness of height-based recurrence analysis follows from: (i) sound extraction of the recurrence constraints used by CHORA to characterize non-linear recursion; (ii) sound recurrence solving; and (iii) soundness of the underlying framework of algebraic program analysis. The soundness of parts (ii) and (iii) depends on the soundness of prior work (Kincaid et al., 2018). The soundness of (i) is addressed in a detailed proof in the appendix of this document. The soundness property proved there is as follows: let be a procedure to which candidates and filtering have been applied to obtain a stratified recurrence. Let be the relational expressions computed by candidates. Let be such that is the set of functions produced by solving the stratified recurrence. We show that each function bounds the corresponding value set. In other words, the following statement holds: .

4.2. Depth-Bound Analysis

Input : A weighted control-flow graph
Output : Depth-bound formulas
1 foreach  do
2   Let be a new vertex
3    Let be a new vertex;
4    ;
5    Create a new integer-valued auxiliary variable ;
6    ;
7    foreach  do
8       } }
9       foreach call edge in  do
10          if  for some  then
11            
12             else
13               
14               
15                foreach  do
16                  
return
Algorithm 4 Algorithm for producing a depth-bound formula

In Height, we showed how to find a bounding function that gives an upper bound on the value of a relational expression in an execution of a procedure as a function of the stack height (i.e., maximum depth of recursion) of that execution. In this section, the goal is to find bounds on the maximum depth of recursion that may occur as a function of the pre-state (which includes the values of global variables and parameters to ) from which is called.

For example, consider SubsetSum. The algorithms of Height determine bounds on the values of two relational expressions in terms of , namely: , and . The algorithm of this sub-section (depth) determines that satisfies . These facts can be combined to form a procedure summary for SubsetSumAux that relates the return value and the increase to nTicks to the values of the parameters i and n.

The stack height required to execute a procedure often depends on the number of times that some transformation can be applied to the procedure’s parameters before a base case must execute. For example, in SubsetSum, the height bound is a consequence of the fact that i is incremented by one at each recursive call, until , at which point a base case executes. Likewise, in a typical divide-and-conquer algorithm, a size parameter is repeatedly divided by some constant until the size parameter is below some threshold, at which point a base case executes. Intuitively, the technique described in this section is designed to discover height bounds that are consequences of such repeated transformations (e.g., addition or division) applied to the procedures’ parameters.

To achieve this goal, we use depth, which is inspired by the algorithm for computing bounds on the depth of recursion in Albert et al. (2013). depth constructs and analyzes an over-approximate depth-bounding model of the procedures that includes an auxiliary depth-counter variable, . Each time that the model descends to a greater depth of recursion, is incremented. The model exits only when a procedure executes its base case. In any execution of the model, the final value of thus represents the depth of recursion at which some procedure’s base case is executed.

depth takes as input a representation of the procedures in as a single, combined control-flow graph having two kinds of edges: (1) weighted edges , which are weighted with a transition formula , and (2) call edges in the set . Each call edge in is a triple , in which is the call-site vertex, is the return-site vertex, and the edge is labeled with , representing a call to a procedure . We assume that if any procedure is called by some procedure in , then has been fully analyzed already, and therefore a procedure summary for has already been computed. Each procedure has an entry vertex , an exit vertex , and a transition formula that over-approximates the base cases of . Note that consists of several disjoint, single-procedure control-flow graphs when .

On DepthFirstLineDepthBeforeSummary, depth constructs the depth-bounding model, represented as a new control-flow graph . The algorithm begins by creating new auxiliary entry vertices for the procedures and a new auxiliary exit vertex . The new vertex set contains along with these new vertices. depth then creates a new integer-valued variable . For , the algorithm then creates an edge from to , weighted with a transition formula that initializes to one, and an edge from weighted with the formula , which is a summary of the base case of .

depth replaces every call edge with one or more weighted edges. Each call to a procedure is replaced by an edge weighted with the procedure summary for . Each call to some is replaced by two edges. The first edge represents descending into , and goes from to , and is weighted with a formula that increments and havocs local variables. The second edge represents skipping over the call to rather than descending into . This edge is weighted with a transition formula that havocs all global variables and the variable return, but leaves local variables unchanged.

The final step of depth, on PathSummary, actually computes the depth-bounding summary for each procedure . Because there are no call edges in the new control-flow graph , intraprocedural-analysis techniques can be used to compute transition formulas that summarize the transition relation for all paths between two specified vertices. For each procedure , the formula is a summary of all paths from to , which serves to relate to , which is the pre-state of the initial call to .

The formulas for can be used to establish an upper bound on the depth of recursion in the following way. Let be a state pair in the relational semantics of . Then, there is an execution of that starts in state and finishes in state , in which the maximum222 Note that non-terminating executions of do not correspond to any state-pair in the relational semantics ; therefore, such executions are not represented in the procedure summary for that we wish to construct. recursion depth is some . Then there is a path through the control-flow graph that corresponds to the path taken in to reach some execution of a base case at the maximum recursion depth . Therefore, if is a possible depth of recursion when starting from state , then there is a satisfying assignment of in which takes the value . The contrapositive of this argument says that, if there does not exist any satisfying assignment of in which takes the value , then it must be the case that no execution of that starts in state can have maximum recursion depth . In this way, can be interpreted as providing bounds on the maximum recursion depth that can occur when is started in state .

Once we have the depth-bound summary for some procedure , we can combine it with the closed-form solutions for bounding functions that we obtained using the algorithms of Height to produce a procedure summary. Let be the set of indices such that we found a recurrence for the bounding function . We produce a procedure summary of the form shown in (LABEL:HBASummary), which uses the depth-bound summary to relate the pre-state to the variable , which in turn is used to index into the bounding function for each .

(4)

4.3. Finding Lower Bounds Using Two-Region Analysis

In this sub-section, we describe an extension of height-based recurrence analysis, called two-region analysis, that is able to prove stronger conclusions, such as non-trivial lower bounds on the running times of some procedures.

In Height, we discussed height-based recurrence analysis, and showed how it can find an upper bound on the increase to the variable nTicks in SubsetSum. Now, we consider the application of height-based recurrence analysis to the procedure differ shown in Differ. differ uses the global variables x and y to (in effect) return a pair of integers. The pair returned by the procedure is formed from the x value returned by the first call and the y value returned by the second call, each incremented by one. The base case occurs when the parameter n equals zero or one, and at each call site, the parameter n is decreased by either one or two. We will apply height-based recurrence analysis and two-region analysis to look for bounds on and , and their sum and difference, after differ is called with a given value n.

For the purposes of the following discussion, we will focus on x, but the same conclusions apply to y. By applying height-based recurrence analysis to the procedure differ, we can prove that the post-state value is upper-bounded by . At the same time, the analysis also proves a lower bound on by considering the term . However, the bounding function obtained by height-based analysis is the constant function , which yields the trivial lower bound . As a result, the results of height-based recurrence analysis can only be used to prove that the difference between and