Verification of Programs via Intermediate Interpretation

08/24/2017
by   Alexei P. Lisitsa, et al.
University of Liverpool
0

We explore an approach to verification of programs via program transformation applied to an interpreter of a programming language. A specialization technique known as Turchin's supercompilation is used to specialize some interpreters with respect to the program models. We show that several safety properties of functional programs modeling a class of cache coherence protocols can be proved by a supercompiler and compare the results with our earlier work on direct verification via supercompilation not using intermediate interpretation. Our approach was in part inspired by an earlier work by E. De Angelis et al. (2014-2015) where verification via program transformation and intermediate interpretation was studied in the context of specialization of constraint logic programs.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

05/18/2017

Verifying Programs via Intermediate Interpretation

We explore an approach to verification of programs via program transform...
08/07/2020

Transformational Verification of Quicksort

Many transformation techniques developed for constraint logic programs, ...
03/27/2018

An Experiment in Ping-Pong Protocol Verification by Nondeterministic Pushdown Automata

An experiment is described that confirms the security of a well-studied ...
05/16/2019

Direct Interpretation of Functional Programs for Debugging

We make another assault on the longstanding problem of debugging. After ...
08/07/2020

Generating Distributed Programs from Event-B Models

Distributed algorithms offer challenges in checking that they meet their...
07/09/2018

CANAL: A Cache Timing Analysis Framework via LLVM Transformation

A unified modeling framework for non-functional properties of a program ...
01/07/2019

Different Maps for Different Uses. A Program Transformation for Intermediate Verification Languages

In theorem prover or SMT solver based verification, the program to be ve...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

We show that a well-known program specialization technique called the first Futamura projection [15, 46, 21] can be used for indirect verification of some safety properties. We consider functional programs modeling a class of non-deterministic parameterized computing systems specified in a language that differs from the object programming language treated by the program specializer. Let a specializer transforming programs be written in a language and an interpreter Int of a language , which is also implemented in , be given. Given a program p written in , the task is to specialize the interpreter Int(p,d) with respect to its first argument, while the data d of the program p is unknown.

Our interest in this task has been inspired by the following works [4, 6]. The authors work in terms of constraint logic programming (CLP), where the constraint language is the linear arithmetic inequalities imposed on integer values of variables. They use partial deduction [26] and CLP program specialization [12, 5] methods for specializing an interpreter of a C-like language with respect to given programs, aiming at verification of the C-like imperative specifications with respect to the postconditions defined in CLP and defining the same functions (relations) as done by the corresponding C-like programs. Additionally to the CLP program specialization system developed by E. De Angelis et al. and called VeriMAP [5] they use also external satisfiability modulo theories (SMT) solvers. We would also refer to an earlier work by J. P. Gallagher et al. [16] proposing a language-independent method for analyzing the imperative programs via intermediate interpretation by a logic programming language. Note that the transformation examples given in the papers [16, 12, 5] presenting the above mentioned approaches deal with neither function nor constructor application stack in the interpreted programs.

In this paper we focus our attention on self-sufficient methods for specialization of functional programs, aiming at proving some safety properties of the programs. We consider a program specialization method called Turchin’s supercompilation [47, 46, 45, 22] and study potential capabilities of the method for verifying the safety properties of the functional programs modeling a class of non-deterministic parameterized cache coherence protocols [8]. We use an approach to functional modeling of non-deterministic computing systems, first presented by these authors in [30, 31, 29]. The simple idea behind the approach is as follows. Given a program modeling a deterministic computing system, whose behavior depends on and is controlled by an input parameter value, let us call for an oracle producing the input value. Then the meta-system including both the program and the external oracle becomes non-deterministic one. And vice versa, given a non-deterministic system, one may be concerned about the behavior of the system only along one possible path of the system evaluation. In such a case, the path of interest may be given as an additional input argument of the system, forcing the system to follow along the path. Dealing with an unknown value of the additional parameter one can study any possible evolution of the system, for example, aiming at verifying some properties of the system.

Viability of such an approach to verification has been demonstrated in previous works using supercompilation as a program transformation and analysis technique [30, 31, 29, 32, 23], where it was applied to safety verification of program models of parameterized protocols and Petri nets models. Furthermore, the functional program modeling and supercompilation have been used to specify and verify cryptographic protocols, and in the case of insecure protocols a supercompiler was utilized in an interactive search for the attacks on the protocols [2, 40]. In these cases the supercompiler has been used for specializing the corresponding program models, aiming at moving the safety properties of interest from the semantics level of the models to simplest syntactic properties of the residual programs produced by the supercompiler. Later this approach was extended by G. W. Hamilton for verifying a wider class of temporal properties of reactive systems [17, 18].

Given a specializer transforming the program written in a language and used for program model verification, in order to mitigate the limitation of the specification language , in this paper we study potential abilities of the corresponding specialization method for verifying the models specified in another language . We analyze the supercompilation algorithms allowing us crucially to remove the interpretation layer and to verify indirectly the safety properties. The corresponding experiments succeeded in verifying some safety properties of the series of parameterized cache coherence protocols specified, for example, in the imperative WHILE language by N. D. Jones [20]. Nevertheless, in order to demonstrate that our method is able to deal with non-imperative interpreted programs, we consider the case when a modelling language is a non-imperative subset of the basic language . On the other hand, that allows us to simplify the presentation. In order to prove the properties of interest, some of the program models used in the experiments require one additional supercompilation step (i.e., the corresponding residual programs should be supercompiled once again111Note that the method presented in the papers [4, 6] mentioned above sometimes requires a number of iterations of the specialization step and the number is unknown. ).

The considered class of cache coherence protocols effectively forms a benchmark on which various methods for parameterized verification have been tried [43, 8, 10, 14, 31, 29, 28]. In [31, 29] we have applied direct verification via supercompilation approach without intermediate interpretation. The corresponding models of these and others parameterized protocols may be very large and the automatic proofs of their safety properties may have very complicated structures. See, for example, the structure of the corresponding proof [32] produced by the supercompiler SCP4 [36, 37, 39] for the functional program model of the parameterized Two Consumers - Two Producers (2P/2C) protocol for multithreaded Java programs [3]. Taking that into account, the experiments presented in this paper can also be considered as a partial verification of the intermediate interpreters Int(p,d) used in the experiments. That is to say, a verification of the interpreters with respect to the subset of the input values of the argument p, being the program models of the cache coherence protocols.

The program examples given in this paper were specialized by the supercompiler SCP4 [36, 37, 39], which is a program specilalizer based on the supercompilation technique. We present our interpreter examples in a variant of a pseudocode for a functional program while real supercompilation experiments with the programs were done in the strict functional programming language Refal [49]222The reader is welcome to execute several sample Refal programs and even any program written by the user directly from the electronic version of the Turchin book., [50] being both the object and implementation language of the supercompiler SCP4. One of advantages of using supercompilation, instead of other forms of partial evaluation or CLP specialization, is the use of Turchin’s relation (Section 4.2, see also [48, 37, 42]) defined on function-call stacks, where the function calls are labeled by the times when they are generated by the unfold-fold loop. This relation is responsible for accurate generalization of the stack structures of the unfolded program configurations. It is based on global properties of the path in the corresponding unfolded tree rather than on the structures of two given configurations in the path. Turchin’s relation both stops the loop unfolding the tree and provides a guidance of how a given call-stack structure has to be generalized. Proposition 1 proven in this paper shows that a composition of the Turchin and Higman-Kruskal relations may prevent generalization of two given interpreter configurations encountered inside one big-step of the interpreter. Such a prevention from generalization is crucial for optimal specialization of any interpreter w.r.t. a given program.

This paper assumes that the reader has basic knowledge of concepts of functional programming, pattern matching, term rewriting systems, and program specialization.

The contributions of this paper are:

(1) We have developed a method aiming at uniform reasoning on properties of configurations’ sequences that are encountered in specializing an interpreter of a Turing complete language. (2) In particular, we have proved the following statement. Consider specialization of the interpreter with respect to any interpreted program from an infinite program set that is large enough to specify a series of parameterized cache coherence protocols, controlled by a composition of the Turchin (Section 4.2) and Higman-Kruskal (Section 4.1

) relations. Given a big-step of the interpreter to be processed by the unfold-fold loop, we assume that neither generalization nor folding actions were done by this loop up to the moment considered. Then any two non-transitive (Section

4.3) big-step internal configurations are prevented from both generalization and folding actions. (3) We have shown that supercompilation controlled by the composition of the relations above is able to verify some safety properties of the series of parameterized cache coherence protocols via intermediate interpretation of their program models. Note that these program specifications include both the function call and constructor application stacks, where the size of the first one is uniformly bounded on the value of the input parameter while the second one is not. Unlike VeriMAP [5] our indirect verification method involves no post-specialization unfold-fold.

The paper is organized as follows. In Section 2 we describe the syntax and semantics of a pseudocode for a subset of the strict functional language Refal which will be used throughout this paper. We give also the operational semantics of the subset, defining its “self-interpreter”. In Section 3 we outline our approach for specifying non-deterministic systems by an example used through this paper. In Section 4 we shortly introduce an unfold-fold program transformation method known as Turchin’s supercompilation that is used in our experiments. We describe the strategy controlling the unfold-fold loop. The corresponding relation is a composition of Turchin’s relation and a variant of the Higman-Kruskal relation. This composition plays a central role in verifying the safety properties of the cache coherence protocols’ models via intermediate interpretation. In Section 5 we prove in a uniform way a number of properties of a huge number of the complicated configurations generated by specialization of the self-interpreter with respect to the given program modeling a cache coherence protocol. The argumentations given in the section are applicable for the whole series of the protocols mentioned in Section 6. Developing the method of such argumentations is the main aim of the paper and both Proposition 1 and the proof of this proposition are the main results of the paper. The statement given in Proposition 1 can be applied to a wide class of interpreters of Turing complete programming languages. This statement is a theoretical basis for the explanation why the approach suggested in the paper does succeed in verifying the safety properties of the series of the cache coherence protocols via intermediate interpretation. Finally, in Section 6 we report on some other experimental results obtained by using the approach, discuss the results presented in the paper, and compare our experiments with other ones done by existing methods.

2 An Interpreter for a Fragment of the SCP4 Object Language

A first interpreter we consider is an interpreter of a subset of the SCP4 object language, which we aim to put in between the supercompiler SCP4 and programs modeling the cache coherence protocols to be verified. We will refer to this interpreter as a “self-interpreter”.

2.1 Language

::= Program
::= Function Definition
Variable
Application
) Function Application
++ Application
[]
::= Symbol-Type Variable
Constructor Application
Symbol
::= Patterns
::=
[]
::= Variable

Programs in are strict term rewriting systems based on pattern matching.

The rules in the programs are ordered from the top to the bottom to be matched. To be closer to Refal we use two kinds of variables: s.variables range over symbols (i.e., characters and identifiers, for example, ’a’ and ), while e.variables range over the whole set of the S-expressions.333This fragment of Refal is introduced for the sake of simplicity. The reader may think that the syntactic category of list expressions and the parentheses constructor are Lisp equivalents. Actually Refal does not include constructor but, instead of , is used as an associative constructor. Thus the Refal data set is wider as compared with the Lisp data set: the first one is the set of finite sequences of arbitrary trees, while the second one is the set of binary trees. See [49] for details. Given a rule , any variable of should appear in . Each function has a fixed arity, i.e., the arities of all left-hand sides of the rules of and any expression must equal the arity of . The parentheses constructor is used without a name. constructor is used in infix notation and may be omitted. The patterns in a function definition are not exhaustive. If no left-hand side of the function rules matches the values assigned to a call of the function, then executing the call is interrupted and its value is undefined. In the sequel, the expression set is denoted by ; and stand for the data set, i.e., the patterns containing no variable, and the symbols set, respectively. The name set of the functions of an arity is denoted by while stands for . and stand for the e- and s-variable sets, respectively, and denotes . For an expression , , denote the corresponding variable sets of . denotes the multiplicity of in , i. e., the number of all the occurrences of in . is called passive if no function application occurs in otherwise it is called an active expression. stands for the term set, stands for a symbol. Given an expression and a variable substitution , stands for .

2.2 Encoding

In our experiments considered in this paper the protocol program models have to be input values of the interpreter argument with respect to which the interpreter is specialized. Thus the program models should be encoded in the data set of the implementation language of the interpreter. The program models used in this paper are written in a fragment of the language described in Section 2.1, where constructor is not allowed and only unary functions may appear.

Now we have to define the corresponding encoding function denoted by the underline, where the function groups the program rules belonging to the same function as it is shown in the second definition line.

Program
 :  Function Definitions
 :  Rules
 : ’=’ :  Rule
 :  Here
                   Applications
         Variables
                                     and Symbol

Note that any pattern is an expression.

Supercompiler SCP4 in its processing dealing with programs as input data uses this encoding function and utilizes its properties. The image of under the encoding is a proper subset of , i. e., . For example, .

  Int(  (Call s.f e.d), e.P ) ⇒ EvalEvalCalls.fe.de.P ), e.P );  Eval(  (e.env):(Call s.f e.q):e.exp, e.P ) EvalEvalCalls.fEval( (e.env):e.qe.P ), e.P ), e.P ) ++Eval( (e.env):e.expe.P );  Eval(  (e.env):(Var e.var):e.exp, e.P ) ⇒ Subste.env, (Var e.var) ) ++Eval( (e.env):e.expe.P );  Eval(  (e.env):(’*’ e.q):e.exp, e.P ) ⇒ (’*’ Eval( (e.env):e.qe.P )) : Eval( (e.env):e.expe.P );  Eval(  (e.env):s.x:e.exp, e.P ) ⇒ s.x : Eval( (e.env):e.expe.P );  Eval(  (e.env):[], e.P ) ⇒ [];  EvalCalls.fe.d, (Prog s.n) ) ⇒ MatchingF[]LookFors.fProgs.n ) ), e.d );  MatchingFe.old, ((e.p) :’=’ : (e.exp)) : e.defe.d ) MatchingMatche.pe.d, ([]) ), e.expe.defe.d );  Matching(  (e.env), e.expe.defe.d ) ⇒ (e.env):e.exp;  Match(  (Var ’e’ s.n), e.d, (e.env)  ) ⇒ PutVar( (Var ’e’ s.n) : e.d, (e.env) );  Match(  (Var ’s’ s.n) : e.ps.x : e.d, (e.env)  ) Matche.pe.dPutVar( (Var ’s’ s.n) : s.x, (e.env) )  );  Match(  (’*’ e.q) : e.p, (’*’ e.x) : e.d, (e.env)  ) Match( e.p, e.dMatche.qe.x, (e.env) ) );  Matchs.x : e.ps.x : e.d, (e.env)  ) ⇒ Matche.pe.d, (e.env) );  Match[][], (e.env)  ) ⇒ (e.env);  Matche.pe.de.fail ) ⇒ F;  PutVare.assign, (e.env)  ) ⇒ CheckRepVarPutV( (e.assign), e.env[] ) );  PutV(  ((Var s.t s.n) : e.val), ((Var s.t s.n) : e.pval) : e.enve.penv  ) ⇒(Eqe.vale.pval )) :((Var s.t s.n) : e.pval) : e.env ;  PutV(  (e.assign), (e.passign) : e.enve.penv  ) PutV( (e.assign), e.env, (e.passign) : e.penv );  PutV(  (e.assign), []e.penv  ) ⇒ (T) :(e.assign) : e.penv;  CheckRepVar(  (T):e.env  ) ⇒ (e.env);  CheckRepVar(  (F):e.env  ) ⇒ F;  Eqs.x : e.xss.x : e.ys ) ⇒ Eqe.xse.ys );  Eq(  (’*’ e.x) : e.xs, (’*’ e.y) : e.ys ) ⇒ ContEqEqe.xe.y ), e.xse.ys );  Eq[][]  ) ⇒ T;  Eqe.xse.ys ) ⇒ F;  ContEqFe.xse.ys ) ⇒ F;  ContEqTe.xse.ys ) ⇒ Eqe.xse.ys );  LookFors.f, (s.f : e.def) : e.P ) ⇒ e.def;  LookFors.f, (s.g : e.def) : e.P ) ⇒ LookFors.fe.P );  Subst(  ((Var s.t s.n) : e.val) : e.env, (Var s.t s.n)  ) ⇒ e.val;  Subst(e.assign)missing : e.enve.var  ) ⇒ Subste.enve.var ); 

 

Figure 1: Self-Interpreter

2.3 The Interpreter

The self-interpreter used in the experiments is given in Figure 1.

The entry point is of the form . Here the first argument is the application constructor of the main function to be executed. The second argument provides the name of a program to be interpreted. The encoded source of the program will be returned by a function call of whenever it is asked by

. E. g., interpretation of program model Synapse N+1 given in Section

3.1 starts with the following application , where is an input data given to program Synapse. Due to the large size of the encoded programs we omit the definition of function .

asks for the definition of function , calling , and initiates matching the data given by against the patterns of the definition rules. In order to start this pattern matching, it imitates a fail happening in matching the data against a previous nonexistent pattern.

Function runs over the definition rules, testing the result of matching the input data () against the current pattern considered. In the case if the result is the function calls function , asking for matching the input data against the next rule pattern. The environment is initialized by []. If the pattern matching succeeds then returns , where expression is the right-hand side of the current rule and the environment includes the variable assignments computed by the pattern matching. Function is trying to match the input data given in its second argument, step by step, against the pattern given in its first argument. It computes the environment containing the variable substitution defined by the matching. If a variable is encountered then function calls looking for an assignment to the same variable and, if such an assignment exists, the function tests a possible coincidence of the new and old values assigned to the variable. The third rule of function deals with the tree structure, calling this function twice.

Function passes through an expression given in its second argument. The second rule deals with a variable and calls function looking the environment for the variable value and replacing the variable with its value.

We intend to specialize interpreter with respect to its second argument. The corresponding source code of the self-interpreter may be found in http://refal.botik.ru/protocols/Self-Int-Refal.zip.

3 Specifying Cache Coherence Protocols

We illustrate our method [31, 29] for specifying non-deterministic systems by an example used through this paper. The Synapse N+1 protocol definition given below is borrowed from [7]. The parameterized version of the protocol is considered and counting abstraction is used in the specification. The protocol has to react to five external non-deterministic events by updating its states being three integer counters. The initial value of counter is parameterized (so it could be any positive integer), while the other two counters are initialized by zero. The primed state names stand for the updated state values. The empty updates mean that nothing happened.

(rh) dirty + valid ≥1 → .                                     (wh1)    dirty ≥1 → .
(rm) invalid ≥1 → dirty^′= 0, valid^′= valid + 1, invalid^′= invalid + dirty - 1 .
(wh2) valid ≥1 → valid^′= 0, dirty^′= 1, invalid^′= invalid + dirty + valid - 1 .
(wm) invalid ≥1 → valid^′= 0, dirty^′= 1, invalid^′= invalid + dirty + valid - 1 .

Specification of Safety Properties

Any state reached by the protocol should not satisfy any of the two following properties: (1)   invalid ≥0, dirty ≥1, valid ≥1 ; (2)   invalid ≥0, dirty ≥2, valid ≥0 .

3.1 Program Model of the Synapse N+1 Cache Coherence Protocol

The program model of Synapse N+1 protocol is given in Figure 2. The idea behind the program specifications modeling the reactive systems is given in Introduction 1 above. The finite stream of events is modeled by a value. The time ticks are labeled by the events. The counters’ values are specified in the unary notation. The unary addition is directly defined by function , i.e., without referencing to the corresponding macros. Function exhausts the event stream, step by step, and calls for verifying the safety property required from the protocol. Thus function is a predicate. Note that given input values the partial predicate terminates since the event stream is finite. The termination is normal, if the final protocol state asked by the input stream is reachable one, otherwise it is abnormal.

 

Main(  (e.time) : (e.is) ) ⇒ Loop( (e.time) : (Invalid I e.is) : (Dirty ) : (Valid ) ); Loop(  ([]) : (Invalid e.is) : (Dirty e.ds) : (Valid e.vs) ) ⇒Test( (Invalid e.is) : (Dirty e.ds) : (Valid e.vs) ); Loop(  (s.t : e.time) : (Invalid e.is) : (Dirty e.ds) : (Valid e.vs) ) ⇒Loop( (e.time) : Events.t : (Invalid e.is) : (Dirty e.ds) : (Valid e.vs) ) ); Eventrm : (Invalid I e.is) : (Dirty e.ds) : (Valid e.vs) ) ⇒(Invalid Append( (e.ds) : (e.is) )) : (Dirty ) : (Valid I e.vs); Eventwh2 : (Invalid e.is) : (Dirty e.ds) : (Valid I e.vs) ) ⇒(Invalid Append( (e.vs) : (Append( (e.ds) : (e.is) )) )) : (Dirty Imissing) : (Valid ); Eventwm : (Invalid I e.is) : (Dirty e.ds) : (Valid e.vs) ) ⇒(Invalid Append( (e.vs) : (Append( (e.ds) : (e.is) )) )) : (Dirty Imissing) : (Valid ); Append(  ([]) : (e.ys) ) ⇒ e.ys; Append(  (s.x : e.xs) : (e.ys) ) ⇒ s.x : Append( (e.xs) : (e.ys) ); Test(  (Invalid e.is) : (Dirty I e.ds) : (Valid I e.vs) ) ⇒ False; Test(  (Invalid e.is) : (Dirty I I e.ds) : (Valid e.vs) ) ⇒ False; Test(  (Invalid e.is) : (Dirty e.ds) : (Valid e.vs) ) ⇒ True; 

 

Figure 2: Model of the Synapse N+1 cache coherence protocol

4 On Supercompilation

In this paper we are interested in one particular approach in program transformation and specialization, known as supercompilation444From supervised compilation.. Supercompilation is a powerful semantics-based program transformation technique [45, 47] having a long history well back to the 1960-70s, when it was proposed by V. Turchin. The main idea behind a supercompiler is to observe the behavior of a functional program running on a partially defined input with the aim to define a program, which would be equivalent to the original one (on the domain of the latter), but having improved properties. Given a program and its parameterized entry point, supercompilation is performed by an unfold-fold cycle unfolding this entry point to a potentially infinite tree of all its possible computations. It reduces the redundancy that could be present in the original program. It folds the tree into a finite graph of states and transitions between possible parameterized configurations of the computing system. And, finally, it analyses global properties of the graph and specializes this graph with respect to these properties (without additional unfolding steps).555See also Appendix to the extended version of this paper [35]. The resulting program definition is constructed solely based on the meta-interpretation of the source program rather than by a (step-by-step) transformation of the program. The result of supercompilation may be a specialized version of the original program, taking into account the properties of partially known arguments, or just a re-formulated, and sometimes more efficient, equivalent program (on the domain of the original).

Turchin’s ideas have been studied by a number of authors for a long time and have, to some extent, been brought to the algorithmic and implementation stage [39]. From the very beginning the development of supercompilation has been conducted mainly in the context of the programming language Refal [36, 37, 38, 49]. A number of model supercompilers for subsets of functional languages based on Lisp data were implemented with the aim of formalizing some aspects of the supercompilation algorithms [22, 24, 45]. The most advanced supercompiler for Refal is SCP4 [36, 37, 39].

The verification system VeriMAP [5] by E. De Angelis et al. [4, 6] uses nontrivial properties of integers recognized by both CLP built-in predicates and external SMT solvers. We use also a nontrivial property of the configurations. The property is the associativity of the built-in append function ++ supported by the supercompiler SCP4 itself666As well as by the real programming language in terms of which the experiments described in this paper were done., rather than by an external solver.

4.1 The Well-Quasi-Ordering on

The following relation is a variant of the Higman-Kruskal relation and is a well-quasi-ordering [19, 25] (see also [27]).

Definition 1

The homeomorphic embedding relation is the smallest transitive relation on satisfying the following properties, where and .
(1)            (2)
(3-4)

Note that the definition takes into account function , since its infix notation stands for . We use relation modulo associativity of  ++  and the following equalities , and .

Given an infinite sequence of expressions , relation is relevant to approximation of loops increasing the syntactical structures in the sequence; or in other words to looking for the regular similar cases of mathematical induction on the structure of the expressions. That is to say the cases, which allow us to refer one to another by a step of the induction. An additional restriction separates the basic cases of the induction from the regular ones. The restriction is: .

We impose this restriction on the relation modulo the equalities above and denote the obtained relation as . It is easy to see that such a restriction does not violate the quasi-ordering property. Note that the restriction may be varied in the obvious way, but for our experiments its simplest case given above is used to control generalization and has turned out to be sufficient. In the sequel, stands for the following relation and , which is also transitive.

Definition 2

A parameterized configuration is a finite sequence of the form
, where is passive, for all , and ; for all and all variable does not occur in any function application being a sub-expression of . In the sequel, we refer to such a function application given explicitly in the configuration as an upper function application.

The configurations represent the function application stacks, in which all constructors’ applications not occurring in arguments of the upper function applications are moved to the rightmost expressions. Every expression of can be rewritten into an equivalent composition of the configurations connected with -construct (see Section 5.1 for an example). Here the append is treated as a complex constructor777I.e., we use nontrivial properties of configurations containing the ++. See the remark given in the footnote on p. 3. , rather than a function. The rightmost expression is the bottom of the stack. Since the value of is reassigned in each let in the stack, for brevity sake, we use the following presentation of the configurations:
, where variable is replaced with bullet . I.e., the bullet is just a placeholder. The last expression may be omitted if it equals . An example follows:

4.2 The Well-Disordering on Timed Configurations

Let a program to be specialized and a path starting at the root of the tree unfolded by the unfold-fold loop widely used in program specialization be given. The vertices in the path are labeled by the program parameterized configurations. These configurations form a sequence. Given a configuration from such a sequence and a function application from the configuration, we label the application by the time when it is generated by the unfold-fold loop. Such a labeled function application is said to be a timed application. A configuration is said to be timed if all upper function applications in the configuration are timed. Given a timed configuration, all its timed applications have differing time-labels. Given two different configurations , if the unfold-fold loop copies an upper function application from and uses this copy in , then share this timed application. In the sequel, a sequence of the timed configurations generated by the unfold-fold loop is also called just a path. In this section we define a binary relation on the timed configurations in the path. The relation is originated from V. F. Turchin [48] (see also [37, 41, 42]). It is not transitive888See Appendix to the extended version of this paper [35] for an example demonstrating the nontransitivity of relation ., but like the well-quasi-ordering it satisfies the following crucial property used by supercompilation to stop the loop unfolding the tree. For any infinite path there exist two timed configurations such that and (see [48, 42]). For this reason we call relation a well-disordering relation. In the sequel, the time-labels are denoted with subscripts.

Definition 3

Given a sequence of timed configurations ; and are elements of the sequence such that and , , where and , and , stand for function names , labeled with times and , respectively, and are passive expressions.

If , and such that (i.e., and hold), , and (i.e., ), then .

We say that configurations , are in Turchin’s relation . This longest coincided suffix of the configurations are said to be the context, while the parts equal one to another modulo their time-labels are called prefixes of the corresponding configurations.

The idea behind this definition is as follows. The function applications in the context never took a part in computing the configuration , in this segment of the path, while any upper function application in the prefix of took a part in computing the configuration . Since the prefixes of coincide modulo their time-labels, these prefixes approximate a loop in the program being specialized. The prefix of is the entry point in this loop, while the prefix of initiates the loop iterations. The common context approximates computations after this loop. Note that Turchin’s relation does not impose any restriction on the arguments of the function applications in .

For example, consider the following two configurations  and  , then holds. Here the context is , the prefix of is , and the prefix of is , where the subscripts of the application names stand for the time-labels. See also Appendix to the extended version of this paper [35] for a detailed example regarding Turchin’s relation.

4.3 The Strategy Controlling the Unfolding Loop

Now we describe the main relation controlling the unfold-fold loop. That is to say, given a path starting at the root of the unfolded tree, and two timed configurations in the path such that was generated before , this relation stops the loop unfolding the tree and calls the procedures responsible for folding this path. These tools, firstly, attempt to fold by a previous configuration and, if that is impossible, then attempt to generalize this configuration pair. The relation is a composition of relations and . It is denoted with and is a well-disordering (see [48, 42]).

Thus we are given two timed configurations from a path, such that is generated before , and is the last configuration in the path. If relation does not hold, then the unfold-fold loop unfolds the current configuration and goes on. In the case relation holds, these configurations are of the forms (see Section 4.2 for the notation used below):
,
, where the context starts at . Let stand for the prefix of , and stand for the context of followed by .

Now we compare the prefixes as follows. If there exists such that does not hold, then is unfolded and the unfold-fold loop goes on. Otherwise, the sub-tree rooted in is removed and the specialization task defined by the is decomposed into the two specialization tasks corresponding to and . Further the attempts to fold by and by do work. If some of these attempts fail, then the corresponding configurations are generalized. Note that the context may be generalized despite the fact that it does not take a part in computing the current configuration , since a narrowing of the context parameters may have happened.

A program configuration is said to be a transitive configuration if one-step unfolding of the configuration results in a tree containing only the vertices with at most one outgoing edge. For example, any function application of the form , where any , is transitive. For the sake of simplicity, in the experiments described in this paper, the following strategy is used. The unfold-fold loop skips all transitive configurations encountered and removes them from the tree being unfolded. In the sequel, we refer to the strategy described in this section, including relation , as the -strategy.

5 Indirect Verifying the Synapse N+1 Program Model

In this section we present an application of our program verification method based on supercompilation of intermediate interpretations. In general the method may perform a number of program specializations999I. e., the iterated ordinary supercompilation, which does not use any intermediate interpretation, of the residual program produced by one indirect verification., but all the cache coherence protocol program models that we have tried to verify by supercompiler SCP4 require at most two specializations.

Given a program partial predicate modeling both a cache coherence protocol and a safety property of the protocol, we use supercompilation aiming at moving the property hidden in the program semantics to a simple syntactic property of the residual program generated by supercompilation, i.e., this syntactic property should be easily recognized. In the experiments discussed in this paper we hope the corresponding residual programs will include no operator . Since the original direct program model terminates on any input data (see Section 3.1), this property means that the residual predicate never returns and always . Thus we conclude the original program model satisfies the given safety property. In the terms of functional language presented in Section 2.1 the corresponding syntactic property is “No rule’s right-hand side contains identifier .101010Actually is never encountered at all in any residual program generated by repeated launching the supercompiler SCP4 verifying the cache coherence protocol models considered in this paper. I.e., the property is simpler than the formulated one. Given a safety property required from a protocol, in order to look for witnesses violating the property, the method above can be extended by deriving by unfolding, using a specializer in an interactive mode. See [32, 33, 34, 40] for examples of bugged protocols and the corresponding witnesses constructed by means of the supercompiler SCP4.

We can now turn to the program modeling the Synapse N+1 protocol given in Section 3.1. In order to show that the Synapse program model is safe, below we specialize the self-interpreter (see Section 2.3) with respect to the Synapse program model rather than the program model itself. Since program Synapse terminates, the self-interpreter terminates when it interprets any call of the entry function of Synapse. Since Synapse is a partial predicate, the calls of the form , where takes any data, define the same partial predicate. Hence, the self-interpreter restricted to such calls is just another program model of protocol Synapse N+1. This indirect program model is much more complicated as compared with the direct model. We intend now to show that supercompiler SCP4 [36, 37, 39] is able to verify this model. Thus our experiments show potential capabilities of the method for verifying the safety properties of the functional programs modeling some complex non-deterministic parameterized systems. In particular, the experiments can also be considered as a partial verification of the intermediate interpreter used. In other words, verifying the interpreter with respect to a set of the interpreted programs that specify the cache coherence protocols. This specialization by supercompilation is performed by following the usual unfold-fold cycle controlled by the -strategy described in Section 4.3. Note that this program specification includes both the function call and constructor application stacks, where the size of the first one is uniformly bounded on the value of the input parameter while the second one is not.

We start off by unfolding the initial configuration , where the value of is unknown. The safety property will be proved if supercompilation is able to recognize all rules of the interpreted program model, containing the identifier, as being unreachable from this initial configuration.

In our early work [29] we have given a formal model of the verification procedure above by supercompilation. Let a program model and its safety property be given as described above, i.e., a partial program predicate. Given an initial parameterized configuration of the partial predicate, it has been shown that the unfold-fold cycle may be seen as a series of proof attempts by structural induction over the program configurations encountered during supercompilation aiming at verification of the safety property. Here the initial configuration specifies the statement that we have to prove. There are too many program configurations generated by the unfold-fold cycle starting with the initial configuration given above and the self-interpreter configurations are very large. As a consequence it is not possible to consider all the configurations in details. We study the configurations’ properties being relevant to the proof attempts and the method for reasoning on such properties.

5.1 On Meta-Reasoning

Let a program written in and a function application , where is its input data, be given. Let stand for expression , where is the program name. The unfolding loop standing alone produces a computation path starting off . If terminates then the path is finite . In such a case, for any is a configuration not containing parameters, while is either a passive expression, if partial function is defined on the given input data, or the abnormal termination sign otherwise. The unfolding iterates function such that .

Now let us consider the following non-parameterized configuration of the self-interpreter. If terminates then the loop unfolding the configuration results in the encoded passive configuration produced by the loop unfolding .
      

Expression is not a configuration. According to the strategy described in Section 2.3 the unfolding has to decompose expression in a sequence of configurations connected by the let-variables. This decomposition results in

, where is a fresh parameter.

Hence, considering modulo the arguments, the following holds. Given a function-call stack element , this maps the interpreted stack element to this segment of the interpreting function-call stack represented by the first configuration above, when this stack segment will be computed then its result is declared as a value of parameter and the last configuration will be unfolded. Note that (1) these two configurations separated with the let-construct will be unfolded completely separately one from the other, i.e., the first configuration becomes the input of the unfolding loop, while the second configuration is postponed for a future unfolding call; (2) built-in function append is not inserted in the stack at all, since it is treated by the supercompiler as a kind of a special constructor, which properties are known by the supercompiler handling this special constructor on the fly. The sequence between two consecutive applications of the first rewriting rule unfolds the big-step of the interpreter, interpreting the regular step corresponding to the application of a rewriting rule of the definition interpreted.

Given an expression to be interpreted by the interpreter, defines the current state of the function-call stack. Let be the configuration representing this stack state (see Section 2.3). Let be the application on the top of the stack. Then the current corresponding to the application of the first rewriting rule maps to the stack segment , , of the interpreter, considering modulo their arguments, and this stack segment becomes the leading segment of the interpreting function-call stack. The remainder of the interpreted stack is encoded in the arguments of , .

This remark allows us to follow the development of these two stacks in parallel. Given the following two parameterized configurations and we are going to unfold these configurations in parallel, step by step. The simpler logic of unfolding will provide hints on the logic of unfolding .

Now we consider the set of the configuration pairs that may be generated by the unfold-fold loop and are in the relation .

5.2 Internal Properties of the Interpreter Big-Step

In this section we consider several properties of the configurations generated by the unfold-fold loop inside one big-step of the self-interpreter. In order to prove indirectly that the program model is safe, we start off by unfolding the following initial configuration , where the value of is unknown. Let stand for .

Consider any configuration generated by the unfold-fold loop and initializing a big-step of the interpreter. Firstly, we assume that is not generalized and no configuration was generalized by this loop before . In such a case, is of the form
where stands for the formal syntactic argument taken from the right-hand side of a rewriting rule where   originates from, , , and stands for a partially known value of variable . Since application   is on the top of the stack, argument includes no function application. As a consequence, the leading application has only to look for variables and to call substitution if a variable is encountered.

Thus, excluding all the transitive configurations encountered before the substitution, we consider the following configuration:

where are fresh parameters, stands for a part of above to be processed, and denotes the type and the name of the variable encountered.

We turn now to the first configuration to be unfolded. All configurations unfolded, step by step, from the first configuration are transitive (see Section 4.3) since tests only types and names of the environment variables. Function is tail-recursive and returns value asked for.

We skip transforming these transitive configurations and continue with the next one.

By our assumption above, the loop unfolding this first configuration never generates a function application. So the leading configuration proceeds to look for the variables in the same way shown above.

When is entirely processed and all variables occurring in are replaced with their partially known values from the environment, then the current configuration looks as follows:
                                        
Here expression is , were is the substitution defined by environment . I. e., may include parameters standing for unknown data, while does not. Any application of function is one-step transitive. Recalling , we turn to the next configuration:
             

returns the source code of the interpreted program , while the application returns the definition of the function called by the interpreter, using the known name . Skipping the corresponding transitive configurations, we have:
                   
Here the third argument is the definition, where , , stand for the pattern, the right-hand side of the first rewriting rule of the definition, and the rest of this definition, respectively. This application transitively initiates matching the parameterized data