A scheme for dynamically integrating C library functions into a λProlog implementation

06/03/2019 ∙ by Duanyang Jing, et al. ∙ 0

The Teyjus system realizes the higher-order logic programming languageλProlog by compiling programs into bytecode for an abstract machine and executing this translated form using a simulator for the machine. Teyjus supports a number of builtin relations that are realized through C code. In the current scheme, these relations are realized by including the C programs that implement them within the simulator and tailoring the compiler to produce instructions to invoke such code. There are two drawbacks to such an approach. First, the entire collection of library functions must be included within the system, thereby leading to a larger than necessary memory footprint. Second, enhancing the collection of built-in predicates requires changing parts of the simulator and compiler, a task whose accomplishment requires specific knowledge of these two subsystems. This project addresses these problems in three steps. First, the code for the builtin functions is moved from the simulator into a library from where relevant parts, determined by information in the bytecode file, are linked into the runtime system at load time. Second, information is associated with each library function about how it can be invoked from a λProlog program and where the C code for it is to be found. Finally, the compiler is modified to use the preceding information to include relevant linking instructions in the bytecode file and to translate invocations to builtin relations into a special instruction that calls the dynamically linked code. More generally, these ideas are capable of supporting an interface in λProlog to "foreign functions" implemented in C, a possibility that is also discussed.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Prolog is a logic programming language that extends Prolog along several directions. The logical fundation of Prolog — Horn Clauses — are enriched to incorporate possiblities of quantifying over function and predicate variables, and of explicitly representing binding in terms. The resulting richer class of formulas is called higher-order hereditary Harrop formulas [3]. At the programming level, additional features are implemented such as modular programming, abstract datatypes, higher-order programming, and the lambda-tree syntax approach to the treatment of bound variables in syntax [2].

Teyjus is an implementation of the Prolog language that efficiently addresses many implementation challenges posed by the new features. Underlying the Teyjus system is an abstract machine, referred to herein as the lpWAM. This abstract machine inherits from the Warren abstract machine (WAM) for Prolog [1] a basic structure for treating the unification and search operations that are intrinsic to logic programming but also incorporates many new mechanisms for treating the features that are unique to Prolog. The efficient implementation together with the strong logical foundations provide us with a powerful logic programming framework.

Not all computations fit natually in the scheme of logic programming. Some operations are entirely side effects, like I/O operations. They lie outside the realm of pure logic, but are still indispensable for any practical programming languages. Other computations can be described in a purely logical manner but such descriptions can lead to rather inefficient computations. For example, arithmetic operations such as addition and division can be evaluated efficiently using underlying hardware support, whereas realizing them through a logical description can be quite costly in time and space. As with other languages, Prolog solves this problem by isolating such computations in builtin predicates whose concrete implementation can be provided by means other than logical descriptions.

The Teyjus system exploits this structure by implementing builtin predicates via C code. Its approach to doing this is based on incorporating the implementation of these predicates directly into the emulator for the lpWAM. There are two problems with this approach. First, it requires the code for all the builtin predicates to be included in the runtime footprint, regardless of whether or not the user program needs them. Second, integrating builtins in this way requires knowledge of relevant parts of the overall implementation, thereby making the task of adding new builtins more complicated.

We propose an alternative approach to implementing builtins in Teyjus. This approach requires a library builder to construct and maintain an independent collection of C code and information for using such code from Prolog programs. The programmer annotates her program based on the latter information. The compiler then generates instructions for dynamically linking in relevant parts of the library code and for invoking such code at the appropriate places.

The rest of this report explains the problem that we have considered and our solution to it in more detail. In the next section, we provide more specific information about the structure of the Teyjus system and its treatment of builtins. In Section 3, we outline an alternative scheme for realizing these builtins. Sections 4 and 5 then describe the implementation and evaluation of the new scheme. Section 6 concludes this report by discussing extensions to the work it presents.

2 Builtins in the Current Teyjus system

As we have noted earlier, the Teyjus implementation of Prolog is based on using an abstract machine: Prolog programs are translated into code for the lpWAM, which can then be executed to produce the desired effects. Concretely, four subsystems are used to produce this behavior: a compiler, a linker, a loader and a simulator for the lpWAM that is written in C. The Teyjus system permits Prolog programs to be organized into a collection of interacting modules. The compiler examines each module, checks its internal consistency and ensures that it satisfies the promise determined by an associated signature [4]. If all this checks out, the compiler produces a bytecode file for the module. This bytecode file comprises two parts: metadata and instructions to be executed on the abstract machine. The metadata is used by the linker to combine the bytecode files for the different modules constituting the program into one. The metadata also provides information to the loader for creating an initial state for executing the lpWAM instructions. Once the loader has done its job, the simulator is ready to accept user queries and to carry out the computations these queries entail.

We are interested in understanding how builtins are treated within this implementation framework. The Teyjus manual identifies a collection of predefined predicates together with their types. Programmers can use these predicates directly in their code adhering to their type declarations, assuming that they denote relations that follow the semantics that is described for them in the manual. These predicates are in fact implemented via routines embedded in the simulator. The use of these predicates in Prolog programs must eventually translate into invocations of these C routines. To realize this objective, the compiler inserts an instruction of the form builtin i where i is a numeric index into a dispatch table that contains a pointer to the appropriate C routine; executing this instruction will result in a lookup and a transfer of control to the corresponding procedure. Note that the procedure itself will have to convert the arguments, which have the form of Prolog data, into a representation suitable for C and, conversely, it would have to transform the results obtained from the computation in C into the corresponding Prolog form; we refer to these phases as the marshalling and unmarshalling of the data. These conversions will typically require access into the data space of the simulator. This is easily realized because the routines are in fact a part of the simulator.

While the treatment of builtins that we have described is quite simple and logical, it has two drawbacks. First, the code for all the builtins are explicitly integrated into the simulator, no matter which is actually used in the source program. Depending on how large the collection of builtins is, this could result in a memory footprint for the runtime system that is significantly larger than what is actually needed. Second, a significant amount of coordination is needed between the simulator and the compiler and all this must be manifest in the code realizing the overall system. As a prime example of this, observe that the builtin index used by the compiler for a predefined predicate must match exactly with what is contained in the dispatch table. Some of this kind of coordination can be realized through code in the compiler and the simulator that is generated automatically from a common specification file. However, there is still a need to understand and to edit the compiler and the simulator code in relevant places to add new builtins. This requirement can be daunting to a Prolog user who wants to provide new library functions that are realized via C code and thus poses a barrier to the further development of the system library.

3 A New Approach to Realizing Builtin Predicates

We propose a modified approach to realizing builtin predicates that overcomes the deficiencies discussed in the previous section. The central new idea in this approach is to move the code that implements the builtins out of the simulator and into an external library from where parts of it can be linked into the user program as needed. In this way, only the builtins that are being used are loaded into memory. The decoupling also makes the C library independent of the simulator and the compiler, thereby making it easier for a developer to add new functionality to the library.

We need to solve three problems to make this idea work. First, we have to provide a means for a Prolog programmer to be able to use functionality provided by the library without explicit coordination between the compiler and the simulator. Second, we have to provide a means for a library developer to build new functionality—which might have to access the simulator data spaces at least for the marshalling and unmarshalling aspects—without having to delve deeply into the simulator code. Finally, we have to support the possibility of including the needed parts of library code into the runtime image of an Prolog program during the linking and loading phase. We discuss solutions to each of these problems below.

3.1 A Prolog Interface to Library Functionality

Two items of information are needed to support the use of builtin predicates in Prolog programs. First, the names and types of the predicates that are so defined would need to be known so that the programmer may use them in a manner that the compiler can check their uses. Second, the location of the library and the specific code associated with a predefined predicate should be known so that a compiler can generate the necessary linking and dispatch code for the predicate.

The kind of information that is needed can be provided by extending the modules system already present in Teyjus to realize an interface to library components. Typically, a component of the library would consist of code that implements a collection of builtin predicates. To enable the use of these predicates, a Prolog signature file can be associated with each such component. This file would provide the location of the component, the types of the predicates realized, and, for each predicate, a name that identifies the entry point to the code to be invoked. Note that, unlike the case for predicates implemented via Prolog code, there would be no Prolog module file corresponding to such a signature file. Rather, the identified predicates would be realized by the C code that would be invoked through special instructions based on the information in the signature file. The compiler would need to know to do this, but this matter is easily handled by enhancing the modules language to suitably distinguish the inclusion of a builtin “module” from that of a vanilla Prolog module.

3.2 The Simulator Interface for Library Code

As noted already, a library developer should be able to write and compile the C source code for builtin (or external) predicates largely independently of the simulator. The main bottleneck to meeting this requirement is the need to communicate through shared spaces for both the marshalling and unmarshalling aspects as well as to ensure conformity with calling conventions.

To realize the needed independence under the mentioned constraints, a C header file can be created that includes everything a external C function might need to access from the simulator. Library developers would need to assimilate the contents of this header file to understand how to coordinate their code with the simulator and also to use any needed functionality already present in the simulator. This header file must be included in the file that constitutes the C source code for the library predicates. Doing so would allow such a file to be compiled separately and to be stored in object code form in the library. This file will also serve as the interface between external C libraries and the run-time system.

3.3 The Dynamic Linking of Library Components

The signature files associated with relevant library components provide information about any additional object code that must be linked with the simulator to run a given Prolog program. The compiler can generate and emit metadata to the bytecode file that identifies such code. Most operating systems provide system calls to dynamically load and link a shared library. For instance, Unix systems support the dlopen system function that dynamically loads a shared library and links it with the main program. The Teyjus loader can be modified to process the additional metadata to produce a runtime image that includes code realizing library functionality.

A separate issue that also needs resolution is that of realizing the appropriate dispatch corresponding to the invocation of builtin predicates in Prolog programs. Operating systems that permit the dynamic loading and linking of code must also provide a means for resolving symbol references in such code; for example, such resolution can be realized in Unix systems by using the dlsym function. Once a symbol has been resolved, it is an easy matter to realize the needed dispatch.

An important issue to address is when such symbol resolution should take place. It could be done each time an externally implemented predicate is to be invoked. However, if the same predicate is invoked multiple times in a Prolog program, this can become a costly operation. A better alternative is to carry out the resolution once at the time of loading a bytecode file corresponding to a Prolog program and to use the absolute address obtained through this process directly in the dispatch instructions. This approach is easy to implement. The compiler can generate a metadata block that lists the names of the external predicates used in the program and the dispatch instructions it generates can index into this block. The loader would then determine the absolute address for each of the externally implemented predicates, store these addresses in an array and eventually use this array to change the dispatch instructions into ones that use absolute addresses as they are loaded into the code space for the simulator.

4 An Implementation of the New Scheme

We have implemented the new approach to supporting builtin predicates by making various modifications to the existing Teyjus system. At the outset, we had to make changes to the modules language and to the instructions to be used to compile invocations of builtin predicates. We then had to make changes to the compiler, the linker, the loader and the simulator. We describe all these modifications in more detail in subsections below.

4.1 Additions to the Modules Language

As discussed earlier, we have to add to the modules language the capability of describing predicates that are supported through externally provided C code. The chief change towards this end is the inclusion in the language of a new kind of signature file that has the following form:

sig <signame>.
#lib <clibname>.

extern type <lpname1> <cname1> <type1>.
   ...
extern type <lpnameN> <cnameN> <typeN>.

regcl <lpnames>.

The first line in this declaration sequence associates the name signame with this signature; this name can be employed in other user-defined modules to include the definitions in this signature. The second line indicates the location of the C library code that implements the builtin functions that need to be imported into the Prolog environment. This is followed by a sequence of declarations that identify a collection of builtin predicates together with their types and entry points that are ostensibly supported by the C library code: in a declaration of the form

extern type <lpname> <cname> <type>

<lpname> stands for the name of the predicate that can be used in Prolog programs, <cname> is a symbol that identifies the entry point and <type> provides the type of the predicate. The last line in the signature file identifies the subcollection of the predicates provided in this signature that are “register clobbering.” The lpWAM uses a common set of registers for local computations in a predicate invocation as well as for passing parameters to predicates. Given this, it is generally the case that an invocation of a predicate can destroy the values that were passed as arguments and hence the caller has to assume that these contents will not be preserved over the invocation. However, it is useful to know of particular situations in which a predicate will not destroy the values in argument registers: this information can be utilized in allocating registers in a way that minimizes data movement. Now, builtin predicates often do not modify argument register values and hence the compiler uses this as a default assumption. The last line in the signature file informs the compiler of those predicates for which it is not safe to make this assumption.

Given a signature file of this kind, the predicates declared by it can be used in the code in programmer defined modules by “accumulating” the signature. The compiler must carry out different actions when it is accumulating code that is obtained from a C-based library from what it would have to do when accumulating user-defined code. In light of this, we introduce the new declaration

accum_extern <signame>.

to support the accumulation of code from a C-based library.

4.2 Modification to the Instruction Set

The existing version of Teyjus has two instructions for invoking builtin predicates: builtin x and call_builtin x where x is an index into the builtin dispatch table in the simulator.111There are two instructions rather than just one to accommodate for last-call optimization. The details concerning when to use one or the other of these instructions are orthogonal to the focus of this project so we do not discuss the matter further here. In the new model, the invocation of code for builtin predicates does not happen through a dispatch table but, rather, by transferring control to an absolute address. Thus these instructions have to be modified to accommodate the new reality.

In keeping with the above observation, we have complemented the mentioned instructions with two new instructions execute_extern x and call_extern x in which x represents an absolute address. When the compiler generates these instructions, it tentatively uses an index into an array that will be built at load time from metadata in the bytecode file and that will be filled in with absolute addresses determined for the relevant builtins. The loader will eventually replace the indices in the compiled versions of these instructions with absolute addresses to make them ready for execution.

4.3 Modification to the Compiler

There are two broad changes that have to be made to the compiler to accommodate the new treatment of builtins:

  • It must be extended to treat the inclusion of builtin predicates in user programs through the new kind of signature files.

  • It must use the new instructions to compile invocations of builtin predicates in the user code.

We have added code to the compiler to process signature files for predefined predicates. As with signature files for modules implemented via Prolog definitions, this processing adds predicate names together with types associated with them to the symbol table; this step ensures that the necessary information for checking the usage of these predefined predicates in user code is in place. Entries for these predicates in the symbol table are marked in a way that indicates that a different kind of instruction must be generated to realize their invocation. In contrast to the accumulation of regular Prolog modules, the compiler does not look for code realizing the builtin predicates. Instead, it collects the library and entry point names for each predefined predicate and maintains them in a list to be used later in the compilation process.

The bytecode generation phase incorporates two changes. First, metadata is emitted for use by the linker and loader to realize the process of dynamically linking in the C code that defines the builtin predicates. The information for producing this metadata is available from the analysis of the signature file as we have just noted. Second, the new form of instructions must be generated for the invocation of builtin predicates. It should be evident from all that has been said that this step is also easy to realize.

4.4 Modification to the linker

The linking process produces one single bytecode file from separate bytecode files compiled from different modules. Focusing only on the treatment of predefined predicates, this means that the metadata segments for such predicates that come from different bytecode files must be combined into one consolidated segment. The main complexity in doing this is that the indices in the instructions that invoke the predefined predicates must also be relativized to the metadata segment that is so generated. However, even this is not difficult to do: if the consolidated metadata segment is obtained via a linear combination of the individual segments, then the indices for the calling instructions in the separate bytecode files need only be adjusted by a fixed offset.

4.5 Modification to the loader

The loader reads the bytecode file produced by the linker and sets up the memory image of the run-time system. In the new model, the loader should also load the external libraries to be used, and resolve symbols to libraries to absolute addresses. This can be realized by reading the external function table segment and making the dynamic linking system call for every entry in the table. Later when the code region is being loaded, the loader can examine every instruction and update the operands of extern instructions to be the absolute address.

4.6 Modification to the simulator

The work required of the simulator with respect to the treatment of builtins is minimal. All calls to external functions have already been resolved to absolute addresses by the loader, and the functions are augmented to properly manipulate data representations and some states of the simulator. As a result the simulator just needs to invoke the functions. Some builtins are instrinsic to the Prolog language and it makes better sense to leave the code realizing them within the simulator system. These builtins include “solve”, “not”, “eval” and comparison predicates. As a result, the builtin dispatch table and the old instructions for invoking builtin predicates and completely removed from the system. However, only a small number of builtin predicates are treated in the old way and hence there is only a marginal overhead in the footprint in the case that these predicates are not actually employed in the user code.

5 Building Libraries Using the New Scheme

In this section a more practical view of the proposed scheme is discussed, in particular how to extend C code to produce a C-based library, which realizes the original C computation in conformity with the data representation and calling conventions of the Teyjus simulator. As discussed earlier, a library developer needs to deliver two files as an external library. One is the Prolog signature file that describes the predicates implemented in the library, the other is the C implementation compiled as a shared library. Suppose a library developer wants to provide a library with various mathematical predicates, much like the C header file math.h:

sin: real -> real -> o.
cos: real -> real -> o.
tan: real -> real -> o.
  ...

We will discuss in detail how this can be realized in the new scheme.

The starting point would be to acquire the C implementation of these functions. In this example the library developer can simply use the implementation provided in the header file math.h, or write their own implementation. Then the library developer would need to extend the C code to make it conform to the data representation and calling convention of lpWAM. The simulator interface for library code is designed specifically for this purpose, which is a C header file that encapsulates some parts of the simulator that might be called by library code. In this case two functions would be useful:

float TJ_getReal(int i);
void TJ_returnReal(int i, float val);

The first function takes the data at argument register i and converts it to a C float representation. The second function takes a C float, converts it into a real term in lpWAM, and unifies it with the existing data at argument register i. Conceptually the first function should be called on input arguments, and the second function should be called on the argument register that holds the term to be unified with the actual return value. Now the library developer can write a wrapper function that wraps the original C function using the above two functions:

#include <math.h>

void sin_wrapper()
{
    float a1 = TJ_getReal(1);
    double ret = sin((float)a1);
    TJ_returnReal(2, (float)ret);
}

This wrapper function can serve as the entry point from the simulator to the library code, which includes all the work during the invocation of the sin predicate. Because the ”return” function in the end transfers control to some other procedures in the simulator, the wrapper function should have void type and take no arguments. Other functions would follow the same pattern because of their similarity in nature. For predicates with different arity and types of arguments, the library developer just needs to find other functions and call them accordingly in the wrapper function. In the end the library developer would compile the C file that includes all the wrapper functions into a shared object file and deliver it to Prolog programmers.

The Prolog signature file is another file the library developer needs to produce. The syntax for this file is discussed in Section 4.1. With the C code extended and compiled, the library developer would now have all information required for this signature file. As an example, suppose the compiled shared object file has name math.so, the signature file might have the following structure for the mathematical predicates:

#sig math
#lib math

extern type sin sin_wrapper real -> real -> o.
extern type cos cos_wrapper real -> real -> o.
extern type tan tan_wrapper real -> real -> o.
  ...
#regcl sin

Section  4.1 also discussed the need to inform the compiler of those predicates that modify argument register values during invocation. Some functions in the simulator interface would have this behavior. Calling these functions in the extended C code would require the library developer to include the predicate in the list of #regcl predicates.

6 Some Ideas for Further Developments

In this report we have presented a scheme that allows Prolog builtins that are implemented through C code to be treated as components that are loosely coupled with the main Teyjus system. This scheme has the virtue of requiring a library developer to have only a rudimentary understanding of the simulator—as presented through the interface declarations—to extend existing C code to external libraries. In reality, this scheme has the potential for providing a systematic way to integrate any external C computation into the Teyjus system. It would be of interest to explore its development in this direction.

The main issue to deal with in realizing this kind of generalization is to determine how the simulator interface can be enriched to provide a means for marshalling and unmarshalling between complex Prolog data encodings and corresponding C data structures. The current simulator interface only supports primitive types in Teyjus, including int, real, and string, which are merely the starting point of the Teyjus type system. Structured data types introduced by kind declarations and data constructors introduced by type declarations are not supported.

To add support for structured data, we would need to compare typical data encodings in Prolog with the approach used in C. As a typical example, suppose a library developer would like to provide predicates on a pair object:

kind pair type -> type -> type.
type pr   int -> int -> pair int int.

Such an object might be represented in C by using a structure that has two integer fields:

struct pair {  int x;  int y; };

Once the data representations that one needs to translate between are known, it should be easy to write the code to effect such translations. In fact, such marshalling and unmarshalling code can even be generated automatically. Thus, the key issue is how the relevant knowledge about the relevant data representations may be communicated to the library developer. One possible approach to doing this is to use a specification file that describes both the Prolog data objects and the corresponding C representation. Thus, the key task in this endeavor becomes that of describing a convenient and flexible structure for such a specification file.

Once a format for writing such specifications has been described, the next interesting task would be to develop a “marshalling and unmarshalling code generator” component of the Teyjus system that would read the specification file, and automatically generate the code for going between the Prolog and C representations. In fact, this translation generator component can be treated as one additional layer of abstraction between the library developer and external libraries, which could take care of not only the generation of all marshalling and unmarshalling code, but also the generation of the Prolog signature file. In the end, a library developer would only need to provide a specification file that contains high level descriptions of the Prolog data objects and the corresponding C data structures, the Prolog predicates, and the C code that is to realize the functionality and then the library generator would automatically produce the shared library object as well as the Prolog signature file.

The development of both the specification format and component for generating the translation code seems to be a natural next step to this project. Moreover, carrying it out holds the exciting promise of transforming the scheme we have described in this report into a more general and flexible interface between Prolog and C programs. Unfortunately, playing these ideas out in detail is beyond the scope of this Master’s project and has therefore to be left to future work.

Acknowledgements

This material is based upon work partially supported by the National Science Foundation under Grant No. CCF-1617771. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation.

References

  • [1] Hassan Aït-Kaci. Warren’s Abstract Machine: A Tutorial Reconstruction. Logic Programming Research Reports and Notes. MIT Press, Cambridge, MA, 1991.
  • [2] Dale Miller and Gopalan Nadathur. Programming with Higher-Order Logic. Cambridge University Press, June 2012.
  • [3] Dale Miller, Gopalan Nadathur, Frank Pfenning, and Andre Scedrov. Uniform proofs as a foundation for logic programming. Annals of Pure and Applied Logic, 51(1–2):125–157, 1991.
  • [4] Xiaochu Qi, Andrew Gacek, Steven Holte, Gopalan Nadathur, and Zach Snow. The Teyjus system – version 2, 2015. http://teyjus.cs.umn.edu/.