An Incremental Slicing Method for Functional Programs

09/23/2017 ∙ by Prasanna Kumar K., et al. ∙ IIT Bombay Indian Institute of Technology Kanpur 0

Several applications of slicing require a program to be sliced with respect to more than one slicing criterion. Program specialization, parallelization and cohesion measurement are examples of such applications. These applications can benefit from an incremental static slicing method in which a significant extent of the computations for slicing with respect to one criterion could be reused for another. In this paper, we consider the problem of incremental slicing of functional programs. We first present a non-incremental version of the slicing algorithm which does a polyvariant analysis 1 of functions. Since polyvariant analyses tend to be costly, we compute a compact context-independent summary of each function and then use this summary at the call sites of the function. The construction of the function summary is non-trivial and helps in the development of the incremental version. The incremental method, on the other hand, consists of a one-time pre-computation step that uses the non-incremental version to slice the program with respect to a fixed default slicing criterion and processes the results further to a canonical form. Presented with an actual slicing criterion, the incremental step involves a low-cost computation that uses the results of the pre-computation to obtain the slice. We have implemented a prototype of the slicer for a pure subset of Scheme, with pairs and lists as the only algebraic data types. Our experiments show that the incremental step of the slicer runs orders of magnitude faster than the non-incremental version. We have also proved the correctness of our incremental algorithm with respect to the non-incremental version.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Program slicing refers to the class of techniques that delete parts of a given program while preserving certain desired behaviors, for example, memory, state or parts of output. These behaviors are called slicing criteria. Applications of slicing include debugging (root-cause analysis), program specialization, parallelization and cohesion measurement. However, in some of the above applications, a program has to be sliced more than once, each time with a different slicing criterion. In such situations, the existing techniques (weiser84; horwitz88; reps96; Liu:2003; Rodrigues_jucs_12_7; Silva_System_Dependence_Graph) are inefficient as they typically analyze the program multiple times. Each round of analysis involves a fixed point computation on the program text or some intermediate form of the program, typically SDG in the case of imperative languages. We thus require an incremental approach to slicing which can avoid repeated fixpoint computation by reusing some of the information obtained while slicing the same program earlier with a different criterion.

    ( (   ) 1 ( ( ) 3 ( (  )) 3 ( ( ( ) ) 5 ( ( ( ) 14 (+  1) 14 (+  1))) 5 ( ( ( ) 13 :13 :(+  1)))))) 0 0 1 ( (   ) 1 ( ( ) 3 ( (  )) 3 ( ( ( ) ) 5 ( ( ( ) 14 (+  1) 14  )) 5 ( ( ( ) 13 :13 ))))) 0 0 1 ( (   ) 1 ( ( ) 3 ( (  )) 3 ( ( ( ) ) 5 ( ( ( ) 14 14 (+  1))) 5 ( ( ( ) 13 13 : (+  1)))))) 0 0 1
 (a) Program to compute the number of lines and characters in a string. (b) Slice of program in (a) to compute the number of lines only. (c) Slice of program in (a) to compute the number of characters only.
Figure 1. A program in Scheme-like language and its slices. The parts that are sliced away are denoted by .

The example from (reps96) shown in Figure 1b motivates the need for incremental slicing. It shows a simple program in a Scheme-like language. It takes a string as input and returns a pair consisting of the number of characters and lines in the string. Figure 1b shows the program when it is sliced with respect to the first component of the output pair, namely the number of lines in the string (lc). All references to the count of characters (cc) and the expressions responsible for computing cc only have been sliced away (denoted ). The same program can also be sliced to produce only the char count and the resulting program is shown in Figure 1c.

The example illustrates several important aspects for an effective slicing procedure. We need the ability to specify a rich set of slicing criteria to select different parts of a possibly complex output structure (first and second component of the output pair in the example, or say, every even element in an output list). Also notice that to compute some part of an output structure, all prefixes of the structure have to be computed. Thus, slicing criteria have to be prefix-closed. Finally, it seems likely from the example, that certain parts of the program will be present in any slice, irrespective of the specific slicing criterion222the trivial null slicing criteria where the whole program is sliced away is an exception, but can be treated separately.. Thus, when multiple slices of the same program are required, a slicing procedure should strive for efficiency by minimizing re-computations related to the common parts.

In this paper, we consider the problem of incremental slicing for functional programs. We restrict ourselves to tuples and lists as the only algebraic data types. We represent our slicing criteria as regular grammars that represent sets of prefix-closed strings of the selectors and . The slicing criterion represents the part of the output of the program in which we are interested, and we view it as being a demand on the program. We first present a non-incremental slicing method, which propagates the demand represented by the slicing criterion into the program. In this our method resembles the projection function based methods of (reps96; Liu:2003). However, unlike these methods, we do a context-sensitive analysis of functions calls. This makes our method precise by avoiding analysis over infeasible interprocedural paths. To avoid the inefficiency of analyzing a function once for each calling context, we create a compact context-independent summary for each function. This summary is then used to step over function calls. As we shall see, it is this context independent summary that also makes the incremental version possible in our approach.

The incremental version, has a one-time pre-computation step in which the program is sliced with respect to a default criterion that is same for all programs. The result of this step is converted to a set of automata, one for each expression in the program. This completes the pre-computation step. To decide whether a expression is in the slice for a given slicing criterion, we simply intersect the slicing criterion with the automaton corresponding to the expression. If the result is the empty set, the expression can be removed from the slice.

The main contributions of this paper are as follows:

  1. We propose a view of the slicing criterion in terms of a notion called demand (Section 3) and formulate the problem of slicing as one of propagating the demand on the main expression to all the sub-expressions of the program. The analysis for this is precise because it keeps the information at the calling context separate. However it attempts to reduce the attendant inefficiency through the use of function summaries. The difficulty of creating function summaries in a polyvariant analysis, especially when the domain of analysis is unbounded, has been pointed out in (reps96).

  2. Our formulation (Section 4) allows us to derive an incremental version of slicing algorithm that factors out computations common to all slicing criteria (Section 5) and re-uses these computations. To the best of our knowledge, the incremental version of slicing in this form has not been attempted before.

  3. We have proven the correctness of the incremental slicing algorithm with respect to the non-incremental version (Section 5.2).

  4. We have implemented a prototype slicer for a first-order version of Scheme (Section LABEL:sec:exp-result). We have also extended the implementation to higher-order programs (Section 6) by converting such programs to first-order using firstification techniques (Mitchell:2009), slicing the firstified programs using our slicer, and then mapping the sliced program back to the higher-order version. The implementation demonstrates the expected benefits of incremental slicing: the incremental step is one to four orders of magnitude faster than the non-incremental version.

Figure 2. The syntax of our language

2. The target language—syntax and semantics

Figure 2 shows the syntax of our language. For ease of presentation, we restrict the language to Administrative Normal Form (ANF) (chakravarty03perspective). In this form, the arguments to functions can only be variables. To avoid dealing with scope-shadowing, we assume that all variables in a program are distinct. Neither of these two restrictions affect the expressibility of our language. In fact, it is a simple matter to transform the pure subset of first order Scheme to our language, and map the sliced program back to Scheme. To refer to an expression , we may annotate it with a label as ; however the label is not part of the language. To keep the description simple, we shall assume that each program has its own unique set of labels. In other words, a label identifies both the program point and the program that contains it.

A program in our language is a collection of function definitions followed by a main expression denoted as . Applications (denoted by the syntactic category ) consist of functions or operators applied to variables. Expressions () are either an expression, a expression that evaluates an application and binds the result to a variable, or a expression. The keyword is used to mark the end of a function so as to initiate appropriate semantic actions during execution. The distinction between expressions and applications will become important while specifying the semantics of programs.

Figure 3. The semantics of our language

2.1. Semantics

We now present the operational semantics for our language. This is largely borrowed from (asati14; lazyliveness) and we include it here for completeness. We start with the domains used by the semantics:

A value in our language is either a number, or the empty list denoted by , or a location in the heap. The heap maps each location to a pair of values denoting a cons cell. Heap locations can also be empty. Finally, an environment is a mapping from variables to values.

The dynamic aspects of the semantics, shown in Figure 3, are specified as a state transition system. The semantics of applications are given by the judgement form , and those for expressions by the form . Here is a stack consisting of continuation frames of the form . The frame signifies that if the current function returns a value , the next expression to be evaluated is , and the environment for this evaluation is updated with the variable bound to . The start state is , where is the empty environment, is the empty stack, and is the empty heap. The program terminates successfully with result value on reaching the halt state . We use the notation to denote the environment obtained by updating with the value for as . We also use to denote an environment in which each has the value .

3. Demand

We now connect slicing with a notion called demand. A demand on an expression represents the set of paths that the context of the expression may explore of the value of the expression. A demand is represented by a prefix-closed set of strings over . Each string in the demand, called an access path, represents a traversal over the heap. stands for a single-step traversal over the heap by dereferencing the field of a cons cell. Similarly, denotes the dereferencing of the field of a cons cell.

As an example, a demand of on the expression means its context may need to visit the  field of in the heap (corresponding to the string  in the demand). The example also illustrates why demands are prefix-closed—the  field of cannot be visited without visiting first the cons cell resulting from the evaluation of (represented by ) and then the cell corresponding to (represented by ). The absence of  in the demand also indicates that is definitely not visited. Notice that to meet the demand on , the access paths has to be visited starting from . Thus we can think of as a demand transformer transforming the demand to the demand on and the empty demand (represented by ) on .

The slicing problem is now modeled as follows. Viewing the slicing criterion (also a set of strings over ) as a demand333supplied by a context that is external to the program on the main expression , we compute the demand on each expression in the program. If the demand on a expression turns out to be , the expression does not contribute to the demand on and can be removed from the slice. Thus the solution of the slicing problem lies in computing a demand transformer that, given a demand on , computes a demand environment—a mapping of each expression (represented by its program point ) to its demand. We formulate this computation as an analysis called demand analysis.

We use to represent demands and to represent access path. Given two access paths and , we use the juxtaposition to denote their concatenation. We extend this notation to a concatenate a pair of demands and even to the concatenation of a symbol with a demand: denotes the demand and is a shorthand for .


[demand-summary] ∀f, ∀i, ∀σ:    D(e_f,σ,) = DE, ^i_f = _π∈Π DE(π) df_1df_k  ⊢^l 
where is one of , , and represents all occurrences of in
Figure 4. Demand Analysis

3.1. Demand Analysis

Figure 4 shows the analysis. Given an application and a demand ,  returns a demand environment that maps expressions of to their demands. The third parameter to , denoted , represents context-independent summaries of the functions in the program, and will be explained shortly.

Consider the rule for the selector . If the demand on is , then no part of the value of is visited and the demand on is also . However, if is non-empty, the context of has to first dereference the value of using the  field and then traverse the paths represented by . In this case, the demand on is the set consisting of (start at the root of ) and (dereference using  and then visit the paths in ). On the other hand, the rule for the constructor  works as follows: To traverse the path (alternately ) starting from the root of , one has to traverse the path starting from (or ).

Since only visits the root of to examine the constructor, a non-null demand on translates to the demand on . A similar reasoning also explains the rule for . Since, both and evaluate to integers in a well typed program, a non-null demand on translates to the demand on both and .

The rule for a function call uses a third parameter that represents the summaries of all functions in the program.  is a set of context-independent summaries, one for each (function, parameter) pair in the program. represents a transformation that describes how any demand on a call to is transformed into the demand on its th parameter.  is specified by the inference rule demand-summary. This rule gives a fixed-point property to be satisfied by , namely, the demand transformation assumed for each function in the program should be the same as the demand transformation calculated from the body of the function. Given , the rule for the function call is obvious. Notice that the demand environment for each application also includes the demand on itself apart from its sub-expressions. Operationally, the rule demand-summary is converted into a grammar (Section 4) that is parameterized with respect to a placeholder terminal representing a symbolic demand. The language generated by this grammar is the least solution satisfying the rule. The least solution corresponds to the most precise slice.

We finally discuss the rules for expressions given by . The rules for  and  are obvious. The rule for first uses to calculate the demand environment DE of the -body . The demand on is the union of the demands on all occurrences of in . It is easy to see by examining the rules that the analysis results in demands that are prefix-closed. More formally, let be the demand environment resulting from the analysis of a program for a demand . Then, for an expression in the program, is prefix closed.

4. Computing Context-Independent Function Summaries

A slicing method used for, say, debugging needs to be as precise as possible to avoid false errors. We therefore choose to analyze each function call separately with respect to its calling context. We now show how to obtain a context-independent summary for each function definition from the rule demand-summary. Recall that this summary is a function that transforms any demand on the result of a call to demands on the arguments. A convenient way of doing this is to express how a symbolic demand is transformed by the body of a function. Summarizing the function in this way has two benefits. It helps us to propagate a demand across several calls to a function without analyzing its body each time. Even more importantly, it is the key to our incremental slicing method.

However, notice that the rules of demand analysis requires us to do operations that cannot be done on a symbolic demand. The  rule, for example is defined in terms of the set . Clearly this requires us to know the strings in . Similarly, the  rule requires to know whether is . The way out is to treat these operations also symbolically. For this we introduce three new symbols , and , to capture the intended operations. If represents selection using , is intended to represent a use as the left argument of . Thus should reduce to the empty string . Similarly represents the symbolic transformation of any non-null demand to and null demand to itself. These transformation are defined and also made deterministic through the simplification function .

Notice that  strips the leading  from the string following it, as required by the rule for . Similarly,  examines the string following it and replaces it by or ; this is required by several rules. The rules for  and  in terms of the new symbols are:

and the rule for  is:

The rules for ,  and  are also modified similarly. Now the demand summaries can be obtained symbolically with the new symbols as markers indicating the operations that should be performed string following it. When the final demand environments are obtained with the given slicing criterion acting a concrete demand for the main expression , the symbols ,  and  are eliminated using the simplification function .

4.1. Finding closed-forms for the summaries

Recall that is a function that describes how the demand on a call to translates to its th argument. A straightforward translation of the demand-summary rule to obtain is as follows: For a symbolic demand compute the the demand environment in , the body of . From this calculate the demand on the th argument of , say . This is the union of demands of all occurrences of in the body of . The demand on the th argument is equated to . Since the body may contain other calls, the demand analysis within makes use of in turn. Thus our equations may be recursive. On the whole, corresponds to a set of equations, one for each argument of each function. The reader can verify that in our running example is:

As noted in (reps96), the main difficulty in obtaining a convenient function summary is to find a closed-form description of instead of the recursive specification. Our solution to the problem lies in the following observation: Since we know that the demand rules always prefix symbols to the argument demand , we can write as σ, where is a set of strings over the alphabet . The modified equations after doing this substitution will be,

Thus, we have,

4.2. Computing the demand environment for the function bodies

The demand environment for a function body is calculated with respect to a concrete demand. To start with, we consider the main expression as being the body of a function , The demand on is the given slicing criterion. Further, the concrete demand on a function , denoted , is the union of the demands at all call-sites of . The demand environment of a function body is calculated using . If there is a call to inside , the demand summary is used to propagate the demand across the call. Continuing with our example, the union of the demands on the three calls to  is the slicing criterion. Therefore the demand on the expression at program point is given by

(1)

At the end of this step, we shall have (i) A set of equations defining the demand summaries for each argument of each function, (ii) Equations specifying the demand at each program point , and (iii) an equation for each concrete demand on the body of each function .

4.3. Converting analysis equations to grammars

Notice that the equations for are still recursive. However, Equation 1 can also be viewed as a grammar with as terminal symbols and , and as non-terminals. Thus finding the solution to the set of equations generated by the demand analysis reduces to finding the language generated by the corresponding grammar. The original equations can now be re-written as grammar rules as shown below:

(2)

Thus the question whether the expression at can be sliced for the slicing criterion is equivalent to asking whether the language is empty. In fact, the simplification process itself can be captured by adding the following set of five unrestricted productions named and adding the production to the grammar generated earlier.

The set of five unrestricted productions shown are independent of the program being sliced and the slicing criterion. The symbol $ marks the end of a sentence and is required to capture the rule correctly.

We now generalize: Assume that is the program point associated with an expression . Given a slicing criterion , let denote the grammar . Here is the set of terminals , is the set of context-free productions defining , the demand on (as illustrated by example 2). contains the non-terminals of and additionally includes the special non-terminal . As mentioned earlier, given a slicing criterion , the question of whether the expression can be sliced out of the containing program is equivalent to asking whether the language is empty. We shall now show that this problem is undecidable.

Theorem 4.1 ().

Given a program point and slicing criterion , the problem whether is empty is undecidable.

  • Recollect that the set of demands on an expression, as obtained by our analysis, is prefix closed. Since the grammar always includes production , is non-empty if and only if it contains (i.e. empty string followed by the symbol). We therefore have to show that the equivalent problem of whether belongs to is undecidable.

    Given a Turing machine and a string

    , the proof involves construction of a grammar with the property that the Turing machine halts on if and only if accepts . Notice that is a set of context-free productions over the terminal set and may not necessarily be obtainable from demand analysis of a program. However, can be used to construct a program whose demand analysis results in a grammar that can used instead of to replay the earlier proof. The details can be found in Lemmas B.2 and B.3 of (lazyliveness).

Figure 5. (a) & (b) show the simplification of the automaton for the slicing criteria and respectively. (c) shows the canonical automaton and the corresponding completing automaton

We get around the problem of undecidability, we use the technique of Mohri-Nederhoff (mohri00regular) to over-approximate by a strongly regular grammar. The NFA corresponding to this automaton is denoted as . The simplification rules can be applied on without any loss of precision. The details of the simplification process are in (karkare07liveness).

For our running example, the grammar after demand analysis is already regular, and thus remains unchanged by Mohri-Nederhoff transformation. The automata in Figures 5(a) and 5(b) correspond to the two slicing criteria and and illustrate the simplification of corresponding Mohri-Nederhoff automata . It can be seen that, when the slicing criterion is , the language of is empty and hence can be sliced away. A drawback of the method outlined above is that with a change in the slicing criterion, the entire process of grammar generation, Mohri-Nederhoff approximation and simplification has to be repeated. This is likely to be inefficient for large programs.

5. Incremental Slicing

We now present an incremental algorithm which avoids the repetition of computation when the same program is sliced with different criteria. This can be done by pre-computing the part of the slice computation that is independent of the slicing criterion. The pre-computed part can then be used efficiently to slice the program for a given slicing criterion.

In general, the pre-computation consists of three steps: (i) computing the demand at each expression for the fixed slicing criterion and applying the Mohri-Nederhoff procedure to yield the automaton , (ii) a step called canonicalization which applies the simplification rules on until the  and  symbols in the strings accepted by the resulting automaton are only at the end, and, from this (iii) constructing an automaton called the completing automaton. For the running example, the canonicalized and the completing automata are shown Figures 5(c). We explain these steps now.

As stated earlier, the automaton , after some simplifications, gives the first automaton (the canonicalized automaton) shown in Figure 5(c), which we shall denote . It is clear that if is concatenated with a slicing criterion that starts with the symbol , the result, after simplification, will be non-empty. We call a string that starts with as a completing string for . In this case, detecting a completing string was easy because all strings accepted by end with . Now consider the second automaton in Figure 5(c), called the completing automaton, that recognizes the language . This automaton recognizes all completing strings for and nothing else. Thus for an arbitrary slicing criterion , it suffices to intersect with the completing automaton to decide whether the expression at will be in the slice. In fact, it is enough for the completing automaton to recognize just the language instead of . The reason is that any slicing criterion, say , is prefix closed, and therefore is empty if and only if is empty. Our incremental algorithm generalizes this reasoning.

5.1. Completing Automaton and Slicing

For constructing the completing automaton for an expression , we saw that it would be convenient to simplify the automaton to an extent that all accepted strings, after simplification, have and symbols only at the end. We now give a set of rules, denoted by , that captures this simplification.

differs from in that it accumulates continuous run of  and  at the end of a string. Notice that , like , simplifies its input string from the right. Here is an example of simplification:

In contrast the simplification of the same string using gives:

satisfies two important properties:

Property 1 ().

The result of always has the form . Further, if , then .

Property 2 ().

subsumes , i.e., .

Note that while we have defined canonicalization over a language, the actual canonicalization takes place over an automaton—specifically the automaton obtained after the Mohri-Nederhoff transformation. The function createCompletingAutomaton in Algorithm 1 takes , the canonicalized Mohri-Nederhoff automaton for the slicing criterion , as input, and constructs the completing automaton, denoted as .

Function createCompletingAutomaton()
      Data: The Canonicalized Automaton
      Result: , the completing automaton for
      /* Reverse the ‘‘bar’’ transitions: directions as well as labels */
      foreach (transition ) do
           add transition
           foreach (transition ) do
                add transition
                new state /* start state of */
                foreach (state )  do
                     add transition
                     return
                     Function inSlice(e, )
                          Data: expression e, slicing criteria
                          Result: Decides whether e should be retained in slice
                          return
Algorithm 1 Functions to create the completing automaton and the slicing function.

Recollect that the strings recognized by are of the form . The algorithm first computes the set of states reachable from the start state using only edges with labels . This set is called the frontier set. It then complements the automaton and drops all edges with labels. Finally, all states in the frontier set are marked as final states. Since Aπ is independent of the slicing criteria, the completing automaton is also independent of the slicing criteria and needs to be computed only once. It can be stored and re-used whenever the program needs to be sliced. To decide whether can be sliced out, the function inSlice described in Algorithm 1 just checks if the intersection of the slicing criteria with is null.

5.2. Correctness of Incremental Slicing

We now show that the incremental algorithm to compute incremental slices is correct. Recall that we use the following notations: (i) Gπσ is the grammar generated by demand analysis (Figure 4) for an expression in the program of interest, when the slicing criteria is , (ii) Aπ is the automaton corresponding to Gπ after Mohri-Nederhoff transformation and canonicalization, and (iii) Aπ is the completing automaton for . We first show that the result of the demand analysis for an arbitrary slicing criterion can be decomposed as the concatenation of the demand analysis obtained for the fixed slicing criterion and itself.

Lemma 5.1 ().

For all expressions and slicing criteria , = σ.

Proof.

The proof is by induction on the structure of . Observe that all the rules of the demand analysis (Figure 4) add symbols only as prefixes to the incoming demand. Hence, the slicing criteria will always appear as a suffix of any string that is produced by the grammar. Thus, any grammar can be decomposed as for some language . Substituting for , we get . Thus = σ. ∎

Given a string over , we use the notation to stand for the reverse of in which all occurrences of  are replaced by  and  replaced by . Clearly, .

We next prove the completeness and minimality of Aπ.

Lemma 5.2 ().

Proof.

We first prove . Let the string . Then by Lemma 5.1, . By Property 2, this also means that . Since strings in are of the form (Property 1), this means that there is a string such that and , and . Thus can be split into two strings and , such that . Therefore . From the construction of Aπ we have and . Thus, .

Conversely, for the proof of , we assume that a string . From the construction of Aπ we have strings such that , , , is and . Thus, . Thus, is non-empty and . ∎

We now prove our main result: Our slicing algorithm represented by  (Algorithm 1) returns true if and only if (σ) is non-empty.

Theorem 5.3 ().

Proof.

We first prove the forward implication. Let . From Lemma 5.1, . From Property 2, . Thus, there are strings such that . Further in turn can be decomposed as such that and . We also have . Thus is a prefix of .

From the construction of , we know . Further, is a prefix of and , from the prefix closed property of we have . This implies and thus returns true.

Conversely, if is true, then . In particular, . Thus, from Lemma 5.2 we have . Further, since we have .∎

6. Extension to higher order functions

  ( (  ) 1 ( :( )))   ( (   ) 1 (( )) ( ) 2 ( ( ( ) 7  (   ( )))))   ( (  ) 1 ( (+  1)))   ( () 1 (  (  )) 2 (  0) 3 3