A schema represents the statement structure of a program by replacing real functions and predicates by symbols representing them.
A schema, , thus defines a whole class of programs which all have the same
structure. A schema is linear if it does not contain more than one occurrence of the same function or predicate symbol.
As an example, Figure 1 gives a schema ; and Figure 2 shows one of the programs obtainable from the schema of Figure 1 by interpreting its function and predicate symbols.
The subject of schema theory is connected with that of program transformation and was originally motivated by the wish to compile programs effectively. Thus an important problem in schema theory is that of establishing whether two schemas are equivalent; that is, whether they always have the same termination behaviour, and give the same final value for every variable, given any initial state and any interpretation of function and predicate symbols. In Section 1.2, the history of this problem is discussed.
Schema theory is also relevant to program slicing, and this is the motivation for the main results of this paper. We define a quotient of a schema to be any schema obtained by deleting zero or more statements from . A quotient of is non-trivial if it is distinct from . Thus a quotient of a schema is not required to satisfy any semantic condition; it is defined purely syntactically. The field of program slicing is concerned with computing a quotient of a program which preserves part of the behaviour of the original program. Program slicing is used in program comprehension [2, 3], software maintenance [4, 5, 6, 7], and debugging [8, 9, 10, 11].
All program slicing algorithms take account of the structural properties of a program such as control dependence and data dependence rather than the semantics of its functions and predicates, and thus work, in effect, with linear program schemas. There are two main forms of program slicing; static and dynamic.
In static program slicing, only the program itself is used to construct a slice. Most static slicing algorithms are based on Weiser’s algorithm, which uses the data and control dependence relations of the program in order to compute the set of statements which the slice retains. An end-slice of a program with respect to a variable is a slice that always returns the same final value for as the original program, when executed from the same input. It has been proved that Weiser’s algorithm gives minimal static end-slices for linear, free, liberal program schemas. This result has recently been strengthened by allowing function-linear schemas, in which only predicate symbols are required to be non-repeating.
In dynamic program slicing, a path through the program is also used as input. Dynamic slices of programs may be smaller than static slices, since they are only required to preserve behaviour in cases where the original program follows a particular path. As originally formulated by Korel and Laski , a dynamic slice of a program is defined by three parameters besides , namely a variable set , an initial input state and an integer . The slice with respect to these parameters is required to follow the same path as up to the th statement (with statements not lying in the slice deleted from the path through the slice) and give the same value for each element of as after the th statement after execution from the initial state . Many dynamic slicing algorithms have been written [16, 17, 18, 19, 20, 15, 21, 22]. Most of these compute a slice using the data and control dependence relations along the given path through the original program. This produces a correct slice, and uses polynomial time, but need not give a minimal or even non-trivial slice even where one exists.
Our definition of a path-faithful dynamic slice (PFDS) for a linear schema comprises two parameters besides , namely a path through and a variable set, but not an initial state. This definition is analogous to that of Korel and Laski, since the initial state included in their parameter set is used solely in order to compute a path through the program in linear schema-based slicing algorithms. We prove, in effect, that it is decidable in polynomial time whether a particular quotient of a program is a dynamic slice in the sense of Korel and Laski, and that the problem of establishing whether a program has a non-trivial path-faithful dynamic slice is intractable, unless PNP. This shows that there does not exist a tractable dynamic slicing algorithm that produces correct slices and always gives a non-trivial slice of a program where one exists.
The requirement of Korel and Laski that the path through the slice be path-faithful may seem unnecessarily strong. Therefore we define a more general dynamic slice (DS), in which the sequence of functions and predicates through which the path through the slice passes is a subsequence of that for the path through the original schema, but the path through the slice must still pass the same number of times through the program point at the end of the original path. For this less restrictive definition, we prove that it is decidable in Co-NP time whether a particular slice of a program is a dynamic slice, and the problem of establishing whether a program has a non-trivial dynamic slice is NP-hard.
We also give an example to prove that unique minimal dynamic slices (whether or not path-faithful) of a linear schema do not always exist.
The results of this paper have several practical ramifications. First, we prove that the problem of deciding whether a linear schema has a non-trivial dynamic slice is computationally hard and clearly this result must also hold for programs. In addition, since this decision problem is computationally hard, the problem of producing minimal dynamic slices must also be computationally hard. Second, we define a new notion of a dynamic slice that places strictly weaker constraints on the slice than those traditionally used and thus can lead to smaller dynamic slices. In Section 4 we explain why these (smaller) dynamic slices can be appropriate, motivating this through a problem in program testing. Naturally, this weaker notion of a dynamic slice is also directly applicable to programs. Finally, we prove that minimal dynamic slices need not be unique and this has consequences when designing dynamic slicing algorithms since it tells us that algorithms that identify and then delete one statement at a time can lead to suboptimal dynamic slices.
It should be noted that much theoretical work on program slicing and program analysis, including that of Müller-Olm’s study of dependence analysis of parallel programs , and on deciding validity of relations between variables at given program points [24, 25] only considers programs in which branching is treated as non-deterministic, and is thus more ‘approximate’ than our own in this respect, in that we take into account control dependence as part of the program structure.
1.1 Different classes of schemas
Many subclasses of schemas have been defined:
- Structured schemas,
in which goto commands are forbidden, and thus loops must be constructed using while statements. All schemas considered in this paper are structured.
- Linear schemas,
in which each function and predicate symbol occurs at most once.
- Free schemas,
where all paths are executable under some interpretation.
- Conservative schemas,
in which every assignment is of the form
- Liberal schemas,
in which two assignments along any executable path can always be made to assign distinct values to their respective variables by a suitable choice of domain.
It can be easily shown that all conservative schemas are liberal.
Paterson  gave a proof that it is decidable whether a schema is both liberal and free; and since he also gave an algorithm transforming a schema into a schema such that is both liberal and free if and only if is liberal, it is clearly decidable whether a schema is liberal. It is an open problem whether freeness is decidable for the class of linear schemas. However he also proved, using a reduction from the Post Correspondence Problem, that it is not decidable whether a schema is free.
1.2 Previous results on the decidability of schema equivalence
Most previous research on schemas has focused on schema equivalence. All results on the decidability of equivalence of schemas are either negative or confined to very restrictive classes of schemas. In particular Paterson 
proved that equivalence is undecidable for the class of all schemas containing at least two variables, using a reduction from the halting problem for Turing machines. Ashcroft and Manna showed that an arbitrary schema, which may include goto commands, can be effectively transformed into an equivalent structured schema, provided that statements such as are permitted; hence Paterson’s result shows that any class of schemas for which equivalence can be decided must not contain this class of schemas. Thus in order to get positive results on this problem, it is clearly necessary to define the relevant classes of schema with great care.
Positive results on the decidability of equivalence of schemas include the following; in an early result in schema theory, Ianov  introduced a restrictive class of schemas, the Ianov schemas, for which equivalence is decidable. This problem was later shown to be NP-complete [30, 31]. Ianov schemas are characterised by being monadic (that is, they contain only a single variable) and having only unary function symbols; hence Ianov schemas are conservative.
Paterson  proved that equivalence is decidable for a class of schemas called progressive schemas, in which every assignment references the variable assigned by the previous assignment along every legal path.
Sabelfeld  proved that equivalence is decidable for another class of schemas called through schemas. A through schema satisfies two conditions: firstly, that on every path from an accessible predicate to a predicate which does not pass through another predicate, and every variable referenced by , there is a variable referenced by which defines a term containing the term defined by and secondly, distinct variables referenced by a predicate can be made to define distinct terms under some interpretation.
1.3 Organisation of the paper
In Section 2 we give basic definitions of schemas. In Section 3 we define path-faithful dynamic slices and in Section 4 we define general dynamic slices. In Section 5 we give an example to prove that unique minimal dynamic slices need not exist. In Section 6 we prove complexity bounds for problems concerning the existence of dynamic slices. Lastly, in Section 7, we discuss further directions for research in this area.
2 Basic Definitions of Schemas
Throughout this paper, , , and denote fixed infinite sets of function symbols, predicate symbols, variables and labels respectively. A symbol means an element of in this paper. For example, the schema in Figure 1 has function set , predicate set and variable set . We assume a function
The arity of a symbol is the number of arguments referenced by , for example in the schema in Figure 1 the function has arity one, the function has arity zero, and has arity one.
Note that in the case when the arity of a function symbol is zero, may be thought of as a constant.
The set of terms is defined as follows:
each variable is a term,
if is of arity and are terms then is a term.
For example, in the schema in Figure 1, the variable takes the value (term) after the first assignment is executed and if we take the true branch then the variable ends with the value (term) .
We refer to a tuple , where each
is a term, as a vector term. We calla predicate term if and the number of components of the vector term is .
Schemas are defined recursively as follows.
is a schema.
Any label is a schema.
An assignment for a variable , a function symbol and an -tuple of variables, where is the arity of , is a schema.
If and are schemas then is a schema.
If and are schemas, is a predicate symbol and is an -tuple of variables, where is the arity of , then is a schema.
If is a schema, is a predicate symbol and is an -tuple of variables, where is the arity of , then the schema is a schema.
If no function or predicate symbol, or label, occurs more than once in a schema , we say that is linear. If a schema does not contain any predicate symbols, then we say it is predicate-free. If a linear schema contains a subschema , then we refer to and as the -part and -part respectively of in . For example in the schema in Figure 1 the predicate has -part and -part . If a linear schema contains a subschema , then we refer to as the body of in .
Quotients of schemas are defined recursively as follows; is a quotient of every schema; if is a quotient of then is a quotient of and is a quotient of ; if is a quotient of , then is a quotient of ; and if and are quotients of schemas and respectively, then is a quotient of . A quotient of a schema is said to be non-trivial if .
Consider the schema in Figure 1. Here we can obtain a quotient by replacing the first statement by or by replacing the if statement by . It is also possible to replace either or both parts of the if statement by or any combination of these steps.
2.1 Paths through a schema
We will express the semantics of schemas using paths through them; therefore the definition of a path through a schema has to include the variables assigned or referenced by successive function or predicate symbols.
The set of prefixes of a word (that is, a sequence) over an alphabet is denoted by . For example, if over the alphabet , then the set consists of the words and the empty word. More generally, if is a set of words, then we define .
For each schema there is an associated alphabet consisting of all elements of and the set of letters of the form for assignments in and for , where or occurs in . For example, the schema in Figure 1 has no labels and has alphabet
. The set of terminating paths through , is defined recursively as follows.
, for any .
is the empty word.
We sometimes abbreviate to and to .
We define to be the set containing , plus all infinite words whose finite prefixes are prefixes of terminating paths. A path through is any (not necessarily strict) prefix of an element of . As an example, if is the schema in Figure 1, which has no loops, then . In fact, in this case contains exactly two paths, defined by taking the true or false branches, and every path through is a prefix of one of these paths.
If is a quotient of a schema , and (that is, is a path through ), then is the path obtained from by deleting all letters having function or predicate symbols not lying in and all labels not occurring in . It is easily proved that in this case.
2.2 Semantics of schemas
The symbols upon which schemas are built are given meaning by defining the notions of a state and of an interpretation. It will be assumed that ‘values’ are given in a single set , which will be called the domain. We are mainly interested in the case in which (the Herbrand domain) and the function symbols represent the ‘natural’ functions with respect to .
Definition 1 (states, (Herbrand) interpretations and the natural state )
Given a domain , a state is either (denoting non-termination) or a function . The set of all such states will be denoted by . An interpretation defines, for each function symbol of arity , a function , and for each predicate symbol of arity , a function . The set of all interpretations with domain will be denoted .
We call the set of terms the Herbrand domain, and we say that a function from to is a Herbrand state. An interpretation for the Herbrand domain is said to be Herbrand if the functions for each are defined as
for all -tuples of terms .
We define the natural state by for all
In the schema in Figure 1 the natural state simply maps variable to the name , variable to the name , and variable to the name . The program in Figure 2 can be produced from this schema through the interpretation that maps to , to , to , and to ; clearly this is not a Herbrand interpretation.
Observe that if an interpretation is Herbrand, this does not restrict the mappings
defined by for each .
It is well known [36, Section 4-14] that Herbrand interpretations are the only ones that need to be considered when considering many schema properties. This fact is stated more precisely in Theorem 8. In particular, our semantic slicing definitions may be defined in terms of Herbrand domains.
Given a schema and a domain , an initial state with and an interpretation we now define the final state and the associated path . In order to do this, we need to define the predicate-free schema associated with the prefix of a path by considering the sequence of assignments through which it passes.
Definition 2 (the schema )
Given a word for a schema , we recursively define the predicate-free schema by the following rules; , for , and .
Consider, for example, the path of the schema in Figure 1 that passes through the true branch of . Then this defines a word and .
Let be a schema. If , the set is one of the following; a label, a singleton containing an assignment letter , a pair for a predicate of , or the empty set, and if then the last case holds.
Proof. [14, Lemma 6].
Lemma 3 reflects the fact that at any point in the execution of a program, there is never more than one ‘next step’ which may be taken, and an element of cannot be a strict prefix of another.
Definition 4 (semantics of predicate-free schemas)
Given a state , the final state and associated path
of a schema are defined
and is the empty word.
and for .
(where the vector term for ), and
For sequences of predicate-free schemas, and
This uniquely defines and if is predicate-free. In order to give the semantics of a general schema , first the path, , of with respect to interpretation, , and initial state is defined.
Definition 5 (the path )
Given a schema , an interpretation , and a
state, , the path
is defined by the following condition; for all
, the equality holds.
In other words, the path has the following property; if a predicate expression along is evaluated with respect to the predicate-free schema consisting of the sequence of assignments preceding that predicate in , then the value of the resulting predicate term given by ‘agrees’ with the value given in . Consider, for example, the schema given in Figure 1 and the interpretation that gives the program in Figure 2. Given a state in which has a value greater than one, we obtain the path .
By Lemma 3, this defines the path uniquely.
Definition 6 (the semantics of arbitrary schemas)
If is finite, we define
(which is already defined, since is predicate-free) otherwise is infinite and we define . In this last case we may say that is not terminating.
For convenience, if is predicate-free and is a state then we define unambiguously ; that is, we assume that the interpretation is Herbrand if is a Herbrand state. Also, if is a path through a schema, we may write to mean .
Observe that and
hold for all schemas (not just predicate-free ones).
Given a schema and , we say that passes through a predicate term if has a prefix ending in for such that holds. In this case we say that is a consequence of . For example, the path of the schema in Figure 1 passes through the predicate term since this path has no assignments to before .
Definition 7 (path compatibility and executability)
Let be a path through a schema . Then is executable if is a prefix of for some interpretation and state . Two paths through schemas are compatible if for some interpretation and state , they are prefixes of and respectively.
The justification for restricting ourselves to consideration of Herbrand interpretations and the state as the initial state lies in the fact that Herbrand interpretations are the ‘most general’ of interpretations. Theorem 8, which is virtually a restatement of [36, Theorem 4-1], expresses this formally.
Let be a set of schemas, let be a domain, let be a function from the set of variables into and let be an interpretation using this domain. Then there is a Herbrand interpretation such that the following hold.
For all , the path .
If and are variables and for and , then also holds.
3 The path-faithful dynamic slicing criterion
In this section we adapt the notion of a dynamic program slice to program schemas. Dynamic program slicing is formalised in the original paper by Korel and Laski . Their definition uses two functions, and , in which denotes the first elements of a trajectory111A trajectory is a path in which we do not distinguish between true and false values for a predicate. There is a one-to-one correspondence between paths and trajectories unless there is an if statement that contains only . and denotes the trajectory with all elements that satisfy predicate removed. A trajectory is a path through a program, where each node is represented by a line number and so for path we have that is the corresponding trajectory.
Korel and Laski use a slicing criterion that is a tuple in which is the program input being considered, denotes the execution of statement as the th statement in the path taken when is executed with input , and is the set of variables of interest.
Let be a slicing criterion of a program and the trajectory of on input . A dynamic slice of on is any executable program that is obtained from by deleting zero or more statements such that when executed on input , produces a trajectory for which there exists an execution position such that
(KL2) for all , the value of before the execution of instruction in equals the value of before the execution of instruction in ,
where is a set of instructions in .
In producing a dynamic slice all we are allowed to do is to eliminate statements. We have the requirement that the slice and the original program produce the same value for each variable in the chosen set at the specified execution position and that the path in up to followed by using input is equivalent to that formed by removing from the path all elements not in the slice. Interestingly, it has been observed that this additional constraint, that , means that a static slice is not necessarily a valid dynamic slice .
We can now give a corresponding definition for linear schemas.
Definition 10 (path-faithful dynamic slice)
Let be a linear schema containing a label , let be a set of variables and let be executable. Let be a quotient of containing . Then we say that is a -path-faithful dynamic slice (PFDS) of if the following hold.
Every variable in defines the same term after as after in .
Every maximal path through which is compatible with has as a prefix.
If the label occurs at the end of , so that for a schema , and is a -dynamic slice of , so that , then we simply say that is a -path-faithful dynamic end slice of .
Let be a linear schema, let be executable, let be a set of variables and let be a quotient of containing . Then is a -PFDS of if and only if for all and every expression which is a consequence of is also a consequence of .
Proof. This follows immediately from the two conditions in Definition 10.
As an example of a path-faithful dynamic end slice, consider the linear schema of Figure 3. We assume that and the path
which passes twice through the body of , in each case passing through , and then leaves the body of . Thus the value of after is . Thus any -DPS of must contain and in order that () is satisfied, and hence contains and . By Theorem 11, would also have to contain , since otherwise would be a consequence of , whereas is not a consequence of . Also, would contain the function symbol , since otherwise would be a consequence of , but not of . Thus itself is the only -PFDS of . Observe that the inclusion of the assignment has the sole effect of ensuring that for every interpretation for which , passes through instead of during its second passing through the body of , and so deleting does not alter the value of after . This suggests that our definition of a dynamic slice may be unnecessarily restrictive, and this motivates the generalisation of Definition 14.
4 A New Form of Dynamic Slicing
Path-faithful dynamic slices of schemas correspond to dynamic program slices and in order to produce a dynamic slice of a program we can produce the path-faithful dynamic slice of the corresponding linear schema. In this section we show how this notion of dynamic slicing can be weakened, to produce smaller slices, for linear schemas and so also for programs.
Consider the schema in Figure 3, the path and variable . It is straightforward to see that a dynamic slice has to retain the predicate since it controls a statement () that updates the value of and this can lead to a change in the value of on the next iteration of the loop. Thus, a dynamic slice with regards to and must retain predicate . Further, the assignment affects the value of and so the value of on the second iteration of the loop in and so a (path-faithful) dynamic slice must retain this assignment.
We can observe that in the value of the predicate on the last iteration of the loop does not affect the final value of . In addition, in the assignment only affects the value of on the last iteration of the loop and this assignment does not influence the final value of . In this section we define a type of dynamic slice that allows us to eliminate this assignment. At the end of this section we describe a context in which we might be happy to eliminate such assignments.
Let be a linear schema and let be a path through .
Let be a while predicate in and let be a terminal path in the body of in . Then a word is a path in if and only if is a path in .
Let be an if predicate in , let and let be terminal paths in the -part and -part respectively of in . Then a word is a path in if and only if is a path in .
Furthermore, in both cases, one path is terminal if and only if the other is terminal.
Proof. Both assertions follow straightforwardly by structural induction from the definition of in Section 2.1.
Let be a linear schema, let be a label and let be paths through . Then we say that is simply -reducible to if can be obtained from by one of the following transformations, which we call simple -reductions.
Replacing a segment within by , where is a terminal path in the body of a while predicate which does not contain in its body.
Replacing a segment within by , where is a terminal path in the -part of an if predicate , does not lie in either part of and the -part of is .
If can be obtained from by applying zero or more -reductions, then we say that is -reducible to . If the condition on the label is removed from the definition then we use the terms reduction and simple reduction.
By Proposition 12, the transformations given in Definition 13 always produce paths through . Observe that if is -reducible to , then the sequence of function and predicate symbols through which passes is a subsequence of that through which passes, and pass through the label the same number of times, and the length of is not greater than that of .
Definition 14 (dynamic slice)
Let be a linear schema containing a label , let be a set of variables and let be executable. Let be a quotient of containing . Then we say that is a -dynamic slice (DS) of if every maximal path through compatible with has a prefix to which is -reducible and such that every variable in defines the same term after as after in .
If the label occurs at the end of , so that for a schema , and is a -dynamic slice of , so that , then we simply say that is a -dynamic end slice of .
Consider again the schema in Figure 3 and path . Here the quotient obtained from by deleting the assignment is a -dynamic end slice of , since the path
is simply reducible from and gives the correct final value for , and and are the only maximal paths through that are compatible with . This shows that a DS of a linear schema may be smaller than a PFDS.
One area in which it is useful to determine the dependence along a path in a program is in the application of test techniques, such as those based on evolutionary algorithms, that automate the generation of test cases to satisfy a structural criterion. These techniques may choose a path to the point of the program to be covered and then attempt to generate test data that follows the path (see, for example,[38, 39, 40, 41]). If we can determine the inputs that are relevant to this path then we can focus on these variables in the search, effectively reducing the size of the search space. Current techniques use static slicing but there is potential for using dynamic slicing in order to make the dependence information more precise and, in particular, the type of dynamic slice defined here.
5 A linear schema with two minimal path-faithful dynamic slices
Given a linear schema, a variable set and a path through , we wish to establish information about the set of all -dynamic slices, which is partially ordered by set-theoretic inclusion of function and predicate symbols. In particular, it would be of interest to obtain conditions on which would ensure that minimal slices were unique since under such conditions it may be feasible to produce minimal slices in an incremental manner, deleting one statement at a time until no more statements can be removed. As we now show, however, this is false for arbitrary linear schemas, whether or not slices are required to be path-faithful. To see this, consider the schema of Figure 4 and the slicing criterion defined by the variable and the terminal path which enters the body of 5 times as follows.
1st time; passes through and , but not through either .
2nd time; passes through , and , but not through .
3rd time; passes through , and , but not through .
4th time; passes through , , and .
5th time; passes through .
Define the quotient of by deleting the entire if statement guarded by and define analogously by interchanging the suffices 1 and 2. By Theorem 11, and are both -PFDS’s of , since will still evaluate to over the path or on paths 2–4. On the other hand, if the if statements guarded by and are both deleted, then on the 4th path, may evaluate to , since never occurs in the predicate term defined by along , hence the final value of may contain fewer occurrences of in the slice than after . Furthermore, every -DS of must contain the function symbols , , and and hence and , since the final term defined by contains these symbols, and so and are minimal -DS’s, and are also both path-faithful.
6 Decision problems for dynamic slices
In this section, we establish complexity bounds for two problems; whether a quotient of a linear schema is a dynamic slice, and whether a linear schema has a non-trivial dynamic slice. We consider the problems both with and without the requirement that dynamic slices be path-faithful.
Definition 15 (maximal common prefix of a pair of words)
The maximal common prefix of words is denoted by . For example, the maximal common prefix of the words and over the five-word alphabet is ; that is, .
Let be a linear schema containing a label and let be paths through