Structural Resolution for Abstract Compilation of Object-Oriented Languages

09/14/2017 ∙ by Luca Franceschini, et al. ∙ Università di Genova Heriot-Watt University 0

We propose abstract compilation for precise static type analysis of object-oriented languages based on coinductive logic programming. Source code is translated to a logic program, then type-checking and inference problems amount to queries to be solved with respect to the resulting logic program. We exploit a coinductive semantics to deal with infinite terms and proofs produced by recursive types and methods. Thanks to the recent notion of structural resolution for coinductive logic programming, we are able to infer very precise type information, including a class of irrational recursive types causing non-termination for previously considered coinductive semantics. We also show how to transform logic programs to make them satisfy the preconditions for the operational semantics of structural resolution, and we prove this step does not affect the semantics of the logic program.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Object-oriented programming is the most widespread computational paradigm in programming languages111See for instance Tiobe index at http://www.tiobe.com/tiobe-index/.. Statically typed languages like Java, C#, and C++ are heavily employed for large scale software development in several domains; however, for implementing applications based on Web or scientific programming, where often less skilled developers are involved, dynamically typed object-oriented languages like JavaScript and Python have gained more popularity for their gentler learning curve, flexibility and ease of use.

Although such languages favour rapid development, and dynamic software adaptation, they also lack the benefits of a static type system: since all errors are detected only at runtime, programs are less reliable, and debugging and testing are more challenging; furthermore, the absence of type information is a severe obstacle to more efficient implementations, and more effective IDE tools.

Static typing needs more effort and knowledge from the programmer, who is burdened by the task of annotating code with types; this load is bearable for simple nominal type systems, but when more accurate type analysis is required, one has to resort to structural type systems, which are more verbose, complex, and, thus, less intuitive to grasp; for instance, Java wildcards are a form of structural type, though integrated with the nominal type system, which cannot be tamed so simply by ordinary programmers.

From the considerations above we can draw the conclusion that there exists a fundamental trade-off between the benefits of static typing, and those of dynamic typing. In order to reduce this “gap”, type inference, and, more in general, any static type analysis which does not require type annotations, is a viable solution; programmers are relieved from declaring types, and still have all benefits of a dynamic language, but also an effective analysis tool to develop more reliable, and maintainable software. Unfortunately, depending on the language in use and on the expressive power of the type system, type analysis of a dynamic language can become quite hard (or even impossible, i.e., undecidable) to solve.

Consider the following program implementing linked lists, written in a hypothetical dynamic object-oriented language; for simplicity we adopt a Java-like syntax, but the program contains no type annotations.

[H] [fontsize=]Java class EList extends Object EList() super(); addLast(elem) new NEList(elem, this) [fontsize=]Java class NEList extends Object head; tail; NEList(head, tail) super(); this.head = head; this.tail = tail; addLast(elem) new NEList(this.head, this.tail.addLast(elem)) Untyped linked lists. EList.addLast simply creates a new list with (only) the given element, while NEList.addLast recursively reach the end of the list to add the element, and returns the new list. Depending on the expressive power of the underlying type system, a static analysis tool could be able or not to successfully analyze the expression Javanew EList().addLast(42).addLast(false).head and compute its expected type Javaint; however, in a dynamically typed language a tool rejecting such an expression would not be considered very useful, since it should be quite natural for a dynamic language to allow manipulation of lists of heterogeneous elements. Therefore, dynamic languages call for more precise type analysis able to support both parametric and data polymorphism; the former is the ability to pass arguments of unrelated types to the same parameter, the latter allows assignment of values of unrelated types to the same class field. Correct type analysis of the expression above requires parametric polymorphism because the same method JavaaddLast of class JavaNEList is invoked twice with the first argument of type Javaint and Javaboolean, respectively, but also data polymorphism is needed, because the two invocations of JavaaddLast assign to the field Javahead values of type Javaint and Javaboolean, respectively.

Supporting parametric and data polymorphism requires the use of advanced structural types, and ensuring termination of the analysis in presence of recursive types and methods can be challenging. In this paper we investigate an improvement of abstract compilation [6] to get more precise type analysis of object-oriented code involving recursive method invocations.

Abstract compilation is a modular approach to type analysis that exploits logic programming; programs under analysis are abstractly compiled to logic programs, and then analysis is performed through goal resolution.

For instance, in order to infer the type of the last expression about list classes we considered, a goal clause similar to the following one could be used:

The three atoms encode the calls to the constructor and to the JavaaddLast method (twice), respectively. Variables are the types resulting from the operation. After formulating the goal query, it has to be resolved with respect to the logic program generated by abstract compilation of the original one, as it will be shown in Section 3. Finally, the computed substitutions will give terms encoding types, thus effectively solving the inference problem.

To support parametric and data polymorphism in the presence of recursion, the resolution method has to support the coinductive interpretation of the generated logic program [23] based on the greatest complete Herbrand model. However, implementation of the operational semantics of coinductive logic programming (Co-LP for short) fails to successfully analyze some kinds of recursion.

In this paper we show how this drawback can be overcome by adopting structural resolution [18] for the inference engine used for abstract compilation; thanks to the notion of productivity, structural resolution allows an operational semantics which is more expressive than Co-LP (under certain assumptions that will be discussed).

The rest of the paper is organized as follows. Section 2 introduces the necessary background on coinductive logic programming and presents structural resolution, while Section 3 is a detailed introduction to abstract compilation. Section 4 motivates the usefulness of structural resolution for abstract compilation and shows how it can improve previous results, which is the main contribution of this work. Section 5 is devoted to the notion of productivity, and shows a transformation technique that guarantees productivity for abstract compilation. Section 6 contains some concluding remarks and future work directions.

2 (Coinductive) Logic Programming

2.1 Logic Programming Preliminaries

Given a first-order signature consisting of variables, function and predicate symbols, we define terms inductively as is standard: they can be either variables or function symbols of arity applied to terms (). Constants are function symbols of arity . Atoms or atomic formulas have the shape , where is a predicate symbol of arity and are terms. Logic programs are finite sets of definite Horn clauses (clauses for short) , where are atoms. is called the head of the clause and is called the body of the clause. When the body is empty the head is considered to be . A goal clause has the shape .

A substitution is a finite (partial) mapping from variables to terms, where all variables are simultaneously substituted. Two terms and are unifiable by substitution if ; matches by if . If in these two cases, is additionally the most general substitution, we say it is the most general unifier (mgu) or most general matcher (mgm) respectively. Terms are said to be ground if they contain no variables. Given a term , a substitution is grounding for if is ground. All these definitions can be extended to atoms and clauses in the standard way.

Traditionally, the inductive semantics of logic programs is given by Herbrand models. The Herbrand base of a logic program is the set of all ground atoms built from function and predicate symbols in . The least Herbrand model is the smallest subset of that is also a model for each clause in . An atom is logically entailed from if , for some grounding substitution for .

Given a logic program and a goal clause , SLD resolution is a semi-decision procedure to check whether are logically entailed from and, if so, to compute a substitution such that for every in the goal, and for all substitutions grounding for . Thus, goal clauses can be seen as queries to be solved with respect to a logic program, and the computed substitution encodes the answer (if any).

[SLD-resolution reduction] If is a logic program and is a goal clause, the SLD-resolution reduction (with respect to ) is given by if and is the mgu of and .

A more constructive definition of the least Herbrand models can be given in terms of fixed points of a suitable function. Given a logic program , the immediate consequence operator is a function defined on the powerset of the Herbrand base as follows:

Since is monotonic for any logic program , by the Knaster-Tarski theorem it has a least fixed point, which is precisely the least Herbrand model .

For the rest of the paper, we use the following syntactic conventions: function and predicate symbols start with a lowercase letter, and constants are sometimes numbers; variables start with an uppercase letter; atoms, clauses and logic programs are written as single uppercase letters.

2.2 Coinduction in Logic Programming

In inductive logic programming, only terminating SLD derivations are meaningful. However, there are logic programs for which there are no terminating derivations and there is no natural inductive semantics, yet they can still be understood coinductively.

The following logic program defines the infinite list of zeros:

Starting from the goal , SLD resolution does not terminate. Indeed, the program above has no inductive meaning, and its least Herbrand model is empty. Still, its clause has a clear meaning.

Recall that in the inductive interpretation, models contain only finite terms. Coinductive interpretation admits both finite and infinite terms. Given a logic program , is the complete Herbrand base containing all finite and infinite atoms built on the top of function and predicate symbols in . For the coinductive interpretation, the greatest complete Herbrand model is considered, that is, the greatest subset of that is also a model for . In example 2.2, but .

The duality between the inductive and the coinductive interpretation extends to the fixed point semantics: inductive models are the least fixed point of the immediate consequence operator while coinductive models are the greatest fixed point of the operator (extended to possibly infinite terms). The existence of the greatest fixed point is again ensured by the Knaster-Tarski theorem.

In the 80s, the notion of formulas computable at infinity was introduced [22]. An infinite formula is computable at infinity, if there exists a finite formula such that has an infinite (and fair) SLD-derivation, and substitutions computed in the course of this derivation yield . For example, is computable at infinity for the program and the query . In such cases we also say that the infinite SLD-derivation for is globally productive, in a sense of producing an infinite term as a substitution.

Operationally, dealing with infinite terms and non-terminating derivations is a challenge. Co-LP [23] extends SLD resolution with a cycle detection mechanism that allows the derivation to be concluded when a goal unifies with a previously encountered one. Considering again Example 2.2, the reduction holds with the computed substitution . At this point, Co-LP checks if the two goals unify, and indeed they do: the computed answer is 222In the coinductive setting the occurs check needs to be removed. which corresponds to the infinite term specified by the recursive equation , that is .

Note that the recursive term is regular [10] (a.k.a rational or cyclic) since it has a finite number of subterms, namely and itself. Because Co-LP’s algorithm relies on unification of the looping coinductive goals, it can only handle regular terms and derivations. As a result, it does not terminate on irregular derivations, thus it is only sound but not complete w.r.t the greatest complete Herbrand model.

2.3 Structural Resolution

Structural resolution [20, 17, 12] (or S-resolution for short) proposes a solution for cases when formulas computable at infinity are not regular. Consider the following example. The coinductive program below has the following single clause:

Given the query , and writing as an abbreviation for the stream constructor , here we have that the infinite atom is computable at infinity by and it is also contained in the greatest complete Herbrand model of . Coinductive reasoning on this query cannot be handled by the loop detection mechanism of Co-LP because the atom is irrational and the looping subgoals will fail to unify.

In such cases, it may still be possible to automatically prove that the SLD-derivation for the query will be infinite, non-failing, and moreover will compute an infinite term at infinity, even if we cannot generate its closed form, as for . In the core of this new argument is the detection of a regular pattern – a constructor – that works as a building block of the infinite term computed at infinity; in this constructor is . We now explain the method that detects such patterns in S-resolution.

S-resolution [20, 17, 12]

stratifies the SLD-derivation steps into those done by term-matching and those requiring full unification. Term-matching in this case plays a role that pattern-matching on constructors of data structures plays in functional programming.

[14] If is a logic program and and is a goal clause, then:

  • rewriting reduction: if and is the mgm for against ();

  • substitution reduction: if and and are unifiable via mgu .

The S-resolution reduction with respect to is . We write to indicate the reduction of to its -normal form with respect to if this normal form exists, and to indicate an infinite reduction of with respect to otherwise.

One can show that under certain conditions, SLD-resolution reductions and structural resolution reductions are equivalent, see [14, 18]; but S-resolution has one advantage: it helps to detect the constructors from which the infinite data structure is built.

Firstly, we represent the structural resolution reductions as tree rewriting: the figure below shows how rewriting reduction steps can be represented as rewriting trees and substitution reduction steps shown horizontally as rewriting tree transitions. This separation makes it easy to see that in this derivation, the same pattern gets consumed by rewriting steps and gets added, or produced, in the substitution steps:

 

We can only detect this pattern if rewriting trees are finite, i.e. all rewriting reductions are normalising. By definition, observational productivity of an S-resolution reduction for a program and a query is in fact a conjunction of two properties [12]:

  • universal observability: normalisation of all rewriting reductions in this S-resolution reduction, and

  • existential liveness: non-termination of this S-resolution reduction.

S-resolution terminates when these two properties are satisfied. Otherwise, S-resolution generates infinite derivations lazily, showing only partial answers. For example, S-resolution will terminate lazily after the three substitution steps shown above, and will output the partial answer: .

One can show that observational productivity implies global productivity [18, 12]. Note that universal observability is thus a formal pre-condition for reasoning about observational productivity of S-resolution. Not every non-terminating program is globally and observationally productive.

Consider the following logic program :

The two goal clauses and lead to the following non-terminating SLD derivations, respectively:

In the first derivation, each derivation step gives a better approximation of the rational term by incrementally instantiating free variables , , , …. In the second one, the goal never changes, and there is no “real” progress. Nevertheless, both the rational atoms and belong to the greatest complete Herbrand model of the program above.

Following the idea of computations at infinity [13, 22], only the first derivation actually computes an infinite term after an infinite number of steps and only the first derivation is globally productive. To see what happens with observational productivity, note that the clause makes the program break this requirement: . On the other hand, S-resolution reductions for will be productive (again note that the constructor gets added in substitution reductions and consumed in rewriting reductions):

Thus, S-resolution will work for queries on but not . For the above derivation for , it will detect that plays a role of an infinite data structure constructor.

3 Abstract Compilation

Abstract compilation [6, 9] is a technique developed in the context of object-oriented programming to exploit the potentialities of logic programming for supporting advanced static type analysis, as investigated also by other authors [24, 2].

In a nutshell, the approach consists in translating the program under analysis into a logic program which abstracts the semantics of the source program; then, performing static type analysis on the program amounts to solving a goal w.r.t. the obtained logic program.

Abstraction is mainly obtained by structural types which represent set of values, following the semantic subtyping approach [3, 5]; boolean type constructors, as union types, and record types allow quite precise analysis if employed in conjunction with abstract compilation; in particular, both parametric and data polymorphism are supported. Solving a goal corresponds to symbolically executing the original source program with types representing set of values.

Abstract compilation strives to reconcile compositional and global analysis, because once the program under analysis has been abstractly compiled, its source code is no longer needed, as long as it remains unmodified, and classes can be abstractly compiled separately. Since analysis corresponds to goal solving, it can only be performed when the whole relevant program has been compiled; this limitation is also a feature, because it promotes precise analysis through context sensitive data and parametric polymorphism.

Finally, abstract compilation offers interesting opportunities to fruitfully exploit compiler technology [7, 8, 4] for more precise and efficient analysis.

3.1 Abstract Compilation at Work

Let us consider the two classes implementing linked lists defined in listing 1 and show how they could be translated into a logic program to perform type analysis on them.

The translation depends on the way values are abstracted, that is, the underlying type system. In this particular case we may use the primitive types , , and  to represent the singleton value Javanull, and the sets of integer and boolean values, respectively; then, represents the set of all instances created from class having at least fields associated with values of types , respectively. To make the type system more expressive, we also introduce union types, corresponding to logical disjunction: represents the set of all values which have type or .

Types represent sets of values and are the terms manipulated by the generated logic programs; for instance, referring to the classes of listing 1, the type represents all objects implementing integer linked lists of length .

Predicates are introduced for representing the different kinds of declarations and constructs of the source language. For instance, predicates and abstract object creation, and method invocation, respectively, while predicate represents method declarations. Consequently, the atom formally states that the invocation of the constructor of class JavaNEList with arguments of type and , respectively, returns a value of type . As another example,

formally states that the invocation of method JavaaddLast on an object of type , and argument of type , returns an object of type .

There exist two separate kinds of Horn clauses333We use Prolog syntactic conventions: variables start with an upper case letter, constants starts with a lower case letter and denotes a list. Moreover, we use the infix notation for the binary function symbol . that are generated by the translation: those encoding the abstract semantics of the source programming languages, which are independent of any analyzed program, and those which are directly derived from the code under analysis. For instance, the clause invoke(,M,A,R) hasmeth(C,M, [—A],R) partly specifies444Two more clauses are needed to deal with union types, and with inherited methods. the abstract semantics of method invocation; it states that the invocation of method on an object of type with arguments of type returns a value of type , if the class of the receiver object has a method returning a value of type when invoked on object Javathis of type with arguments of type . Thus, the semantics of depends on the code of the declared methods. Indeed, for each method declaration, a corresponding clause for predicate is generated; for instance, the following clause is derived from the declaration of method JavaaddLast in class JavaEList: hasmeth(elist,addlast, [This,Elem],R) new(nelist,[Elem,This],R) It states that class JavaEList has method JavaaddLast that, when invoked on the object Javathis of type with argument of type , returns a value of type , providing that constructor of class JavaNEList returns a value of type when invoked on arguments of type , and , respectively.

Analogously, the following clause is generated from the declaration of method JavaaddLast in class JavaNEList, where predicate abstracts the semantics of field access: hasmeth(nelist,addlast, [This,Elem],R) fieldacc(This,head,H) fieldacc(This,tail,T) invoke(T,addlast,[Elem],N) new(nelist,[H,N],R)

3.2 Examples of Queries and Recursive Types

We start by showing a simple goal to typecheck expression [escapeinside=——]Javanew EList().addLast(——), under the assumption that has type Javaint. This can be achieved by solving the goal

which succeeds, as expected, with answer:

As a more elaborated example, let us consider the expression [escapeinside=——]Javanew NEList(——,——).addLast(——), under the assumption that , , and have type , , and , respectively; typechecking this expression corresponds to solving the goal

which succeeds for

This example shows that typechecking can succeed also for expressions which build heterogeneous lists.

For a simple example of type inference, let us consider the problem of finding a valid type assignment for variables and to make the expression [escapeinside=——]Java——.addLast(——) well-typed; this corresponds to the goal invoke(X,addlast,[Y],T) which, for instance, succeeds for

The fact that the logical variable is not in the domain of the computed substitution means that any type can be safely assigned to .

In the previous examples we have only considered types specifying linked lists of fixed length, but for building more interesting types, recursion is needed; this is achieved by considering rational terms (a.k.a. regular or cyclic). For instance, the unique term defined by the solution of the unification problem T = corresponds to the recursive type specifying the set of all integer linked lists of arbitrary length.

All example queries considered so far can be solved w.r.t. the standard inductive interpretation of Horn clauses, that is, the least Herbrand model, even though method JavaaddLast is recursive in class JavaNEList; however, if recursive types are involved in queries, then the least Herbrand model is no longer sufficient to capture their intended meaning, as shown in Section 4.

4 Coinduction and Structural Resolution in Abstract Compilation

4.1 The Need for Coinduction

As already mentioned in the previous section, the intended meaning of goals and logic programs does not always coincide with the least Herbrand model. When recursive types and methods are involved, the coinductive interpretation is needed, i.e., the greatest complete Herbrand model has to be considered [22].

Let us consider the recursive method Javareplicate from listing 4.1: if Javan is not positive, then the method returns an empty list, otherwise it recursively builds a list of Javan-1 occurrences of Javax, and then returns the (newly created) list where element Javax has been added at the beginning. We refer to listing 1 for the definitions of the classes JavaEList and JavaNEList.

[ht] [escapeinside=——]Java class ListFact extends Object ListFact() super(); replicate(n, x) if (n —— 0) new EList() else new NEList(x, this.replicate(n-1, x)) Given an integer Javan and an element Javax, Javareplicate returns a list containing Javan occurrences of Javax.

By means of abstract compilation, method Javareplicate would be translated555For the sake of readability, we have applied some simplifications to the resulting clause; however, such changes do not affect its semantics. to the following clause: hasmeth(listfact, replicate, [This, int, X], E ∨NE) new(elist, , E) invoke(This, replicate, [int, X], R) new(nelist, [X, R], NE) The first atom of the body corresponds to the invocation of the constructor of JavaEList, while the other two atoms are generated from the recursive invocation, and from the invocation of the constructor of JavaNEList, respectively. Finally, the use of the conditional expression is reflected in the term . Note that the Horn clause above is the only clause generated by abstract compilation for method Javareplicate.

Let us now consider the expression Java—new ListFact().replicate(10, 42)—. To infer the type of the expression above (w.r.t. the classes in listings 1 and 4.1), the following goal is generated:

However, such a goal fails to succeed if the inductive interpretation is considered; indeed, the SLD derivation666Colours enlighten the substitution computed by unification along the way. is non-terminating:

In the derivation above, every new atom for predicate yields a better approximation for , but unfortunately the derivation never terminates. If we interpret coinductively the logic program obtained by abstract compilation, thus considering its greatest complete Herbrand model rather than its least one, the goal above is actually entailed by the program with a substitution instantiating with the following type:

The type above represents all integer lists of arbitrary length; however, such a type corresponds to an infinite term. Fortunately, there exists an equivalent type corresponding to the rational term [10] specified by the following recursive equation:

This example shows that goals involving recursive types and methods require a coinductive interpretation of the logic program obtained by abstract compilation, in order to make static type analysis more precise.

As already illustrated in Section 2, answer substitutions with rational terms can be computed by extending SLD resolution with cycle-detection techniques [23], as proposed with Co-LP; hence, an inference engine based on Co-LP improves the result of static analysis performed with abstract compilation [6]. The Co-LP inference engine is however limited, since it succeeds with rational terms, and derivations, but it cannot handle more complex scenarios.

4.2 Structural Resolution for Abstract Compilation

Listing 4.2 shows a slightly more involved example. Suppose we add to JavaListFact a method JavabuildList that, given an integer Javan, builds the list of integers with an auxiliary method JavabuildList which exploits tail recursion with an accumulator parameter Javaacc; a more realistic Java implementation would of course avoid recursion, and use instead a simpler and more efficient loop, for which static type analysis for abstract compilation is less problematic if one exploits SSA intermediate form [7, 8, 4] during the compilation phase; however, in the past years object-oriented languages have begun to exploit more and more patterns based on functional style programming, possibly with recursion and accumulators. [ht] [escapeinside=——]Java buildList(n, acc) if (n —— 0) acc else this.buildList(n-1, new NEList(n, acc)) Given an integer Javan, JavabuildList returns the list of integers followed by Javaacc, which is used as an accumulating parameter.

Abstract compilation of JavabuildList would yield777Again, for readability we are simplifying the clause that would be automatically generated. the following Horn clause:

Suppose we want to infer the type of the expression Javanew List().buildList(42, new EList()). Such an expression is abstractly compiled to the following goal:

The derivation for such a goal is again infinite, hence the coinductive interpretation is needed again:

However, as opposed to the previous example, in this case the derivation is not rational. Indeed, at each step of the derivation a non-equivalent type is computed both for the accumulator and the returned value, since lists of different lengths have non-equivalent types. The following countably infinite set of equations defines the computed answer substitution associated with the whole derivation:

After a closer look at the set above, we can deduce that the considered non rational derivation succeeds because the set of equations above admits a solution, although such a solution involves non rational terms; in particular, the type of the expression Javanew List().buildList(42, new EList()) is non-rational; as a consequence, an inference engine based on Co-LP [23] would fail to compute a type, because no cycle can be detected in the derivation.

In order to solve this problem, we propose to use structural resolution [18] as inference engine for abstract compilation. This new resolution method relies on a productivity notion which is not limited to rational trees, thus it offers the possibility to exploit a more flexible inference engine to allow more expressive static type analysis through abstract compilation.

Starting from the goal above, after a finite number of derivation steps, structural resolution is able to compute the substitution888Depending on the implementation of structural resolution, the computed answer can be more or less precise, since type could be “unfolded” more than once before the (first) answer is returned. , effectively solving the task of determining the type of the expression Javanew List().buildList(42, new EList()). It works by noticing that the pattern is consumed by the terminating rewriting reductions, and is also infinitely produced in a chain of substitution reductions. Hence, serves as a constructor of the infinite data structure produced at infinity.

The computed answer is only partial, since variable is still “unresolved”. However, structural resolution ensures that the non-rational term corresponding to the computed type can be incrementally unfolded for an arbitrary number of steps: if needed, the substitution for can be computed in a finite number of derivation steps, thus providing a better approximation of the type associated with . In this sense, the use of structural resolution in conjunction with abstract compilation gives rise to the implementation of a lazy type inference procedure.

5 Ensuring Universal Observability of Coinductive Logic Programming

5.1 Universal Observability and Program Transformation

In the previous section we discussed how structural resolution can be useful to handle (a class of) non-terminating derivations in finite time, when derivations compute an infinite irrational term. However, this new resolution method can be successfully employed only if logic programs are universally observable.

Logic programs resulting from abstract compilation are not universally observable in the general case.

Consider for instance abstract compilation for an object-oriented language supporting nominal subtyping; the following three clauses should be generated for all source programs: