A Formal Approach to the Engineering of Domain-Specific Distributed Systems

by   Rocco De Nicola, et al.

We review some results regarding specification, programming and verification of different classes of distributed systems which stemmed from the research of the Concurrency and Mobility Group at University of Firenze. More specifically, we examine the distinguishing features of network-aware programming, service-oriented computing, autonomic computing, and collective adaptive systems programming. We then present an overview of four different languages, namely Klaim, Cows, Scel and AbC. For each language, we discuss design choices, present syntax and semantics, show how the different formalisms can be used to model and program a travel booking scenario, and describe programming environments and verification techniques.



There are no comments yet.


page 1

page 2

page 3

page 4


Transition-Oriented Programming: Developing Verifiable Systems

It is extremely challenging to develop verifiable systems that are regul...

Teaching Design by Contract using Snap!

With the progress in deductive program verification research, new tools ...

Proceedings 35th International Conference on Logic Programming (Technical Communications)

Since the first conference held in Marseille in 1982, ICLP has been the ...

Scenic: A Language for Scenario Specification and Data Generation

We propose a new probabilistic programming language for the design and a...

A Domain-Specific Language and Editor for Parallel Particle Methods

Domain-specific languages (DSLs) are of increasing importance in scienti...

Towards A Systems Approach To Distributed Programming

It is undeniable that most developers today are building distributed app...

Programming Requests/Responses with GreatFree in the Cloud Environment

Programming request with GreatFree is an efficient programming technique...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Since the mid-90s, we have witnessed an evolution of distributed computing towards increasingly complex systems formed by several software components featuring asynchronous interactions and operating in open-ended and non-deterministic environments. Such transformation, initially induced by the spreading of internetworking technologies, led to a paradigm shift making software components aware of the underlying network infrastructure. Network awareness, on the one hand, constrained the remote access to distributed resources and, on the other hand, enabled computation mobility, to support different kinds of optimisations.

On top of these networked systems, software components have been then deployed to provide services accessible by end-users and other system components through communication endpoints. This fostered the development of sophisticated applications built by reusing and composing simpler elements. Such service-based compositional approach abstracted from the actual distribution of the involved components over the underlying network, but required to deal with the interaction challenges posed by their heterogeneity. Interoperability was then achieved through the definition of standard protocols and suitable run-time support for programming languages that were taking into account also the failures that could occur in long-term interactions. Moreover, the absence of network awareness meant that there was no need for code mobility.

Later on, the need arose of reducing the maintenance cost of these web-based systems, whose size was becoming bigger and bigger, and of extending their applicability to interact with and control the physical world, possibly in scenarios where human intervention was difficult or even impossible. It was then advocated to rely on autonomic components, which are capable of continuously monitoring their internal status and the working environment, and to adapt their behaviour accordingly. In addition to point-to-point interactions, typical of client-server protocols, more sophisticated forms of interaction could occur that simultaneously involve an ensemble of components dynamically determined. Ensembles are to be intended as collections of task-oriented or dedicated components that pool their resources and capabilities together to create a more complex system, which offers more functionalities and higher performance than simply the sum of the constituent elements.

More recently, in some classes of autonomic computational systems we have witnessed the tremendous growth in the number of interacting components that are usually distributed, heterogeneous, decentralised and interdependent, and operate in dynamic and possibly unpredictable environments. The components form collectives by combining their behaviours to achieve specific goals or to contribute to an emerging behaviour of the global system. Collectives abstract from the identity of the single components to guarantee scalability.

The evolution of distributed computing described above corresponds to the emergence of classes of systems that characterise specific programming domains. Correspondingly, dedicated programming paradigms have been proposed, namely:

network-aware programming

to exploit the knowledge of the underlying infrastructure for better using network facilities and moving programs closer to the resources they want to use CodeMobility:98 ;

service-oriented computing

to allow the exploitation of loosely-coupled services as fundamental resources for developing applications and support the rapid and automatic development of open distributed systems PG-SOC:2003 ;

autonomic computing

to guarantee the self-managing characteristics of distributed computing resources, adapting to unpredictable changes while hiding intrinsic complexity to operators and users KC03 ;

collective adaptive systems programming

to model complex systems with large numbers of heterogeneous entities interacting without a specific central control, and adapting to environmental settings in pursuit of an individual or collective goal anderson2013adaptive .

Besides dealing with the distinctive aspects of each of such domains, the main challenge in engineering these classes of distributed systems is to coordinate the overall behaviour resulting from the involved distributed components while ensuring trustworthiness of the whole system. To meet this goal, many researchers have adopted a language-based approach that combines the use of formal methods techniques with model-driven software engineering. The key ingredients of the resulting methodology, that can be applied to all classes of systems described above, may be summarised as relying on:

  1. a specification language equipped with a formal semantics, which associates mathematical models to each term of the language to precisely establish the expected behaviour of systems;

  2. a set of techniques and tools, built on top of the models, to express and verify properties of interest;

  3. a programming framework together with an associated runtime environment, to actually execute the specified systems.

When specialising this methodology, a major challenge for (specification or programming) language designers is to devise appropriate abstractions and linguistic primitives to deal with the specificities of the domain under investigation. Indeed, including the distinctive aspects of the domain as first-class elements of the language makes systems design more intuitive and concise, and their analysis more effective. In fact, when the outcome of a verification activity is expressed by considering the high level features of a system, and not its low-level representation, system designers can be provided with a more direct feedback.

This paper reviews some of the efforts, to which the authors have contributed, in applying the outlined methodology to the classes of distributed systems mentioned above by taking as starting point process algebras and some of the verification techniques and tools developed for them. The approach was initially applied to network-aware programming and the main result was the definition of the Klaim language Klaim98 that had explicit localities, processes mobility and network connections as primitive notions (Section 2). Afterwards, the approach was applied to service-oriented computing resulting in the design of Cows COWS_JAL whose basic constructs permitted to express correlations between clients and services and to deal with services failures. (Section 3). Instead, to deal with autonomic computing the Scel language DLPT14 was introduced that had explicit notions of agents knowledge and primitives and policies for its manipulations together with an original approach to ad hoc ensembles formation (Section 4). Finally, to model and prove emergent properties of collective adaptive systems a distilled version of Scel named AbC Alrahman3 was introduced that had specific operators for selecting communication partners using predicates on the run time value of relevant attributes of the agents forming the system (Section 5).

In the following parts of this paper, for each of these domain-specific languages, we discuss the design choices behind it, present its syntax and informal semantics, and provide an excerpt of the rules defining its formal operational semantics in terms of labelled transition systems by relying on the Structural Operational Semantics style SOS:Plotkin04 . For each language, we also briefly describe the programming environments that have been developed to support program execution and outline some of the techniques that have been advocated for the verification of properties of the specified systems.

Moreover, to assess the expressive power of the different formalisms and to put them at work, we show how they can be used to model a simple scenario that is instrumental to highlight distinguishing features. For each formalism, we also provide some code snippets showing how close the specification of the model is to its underlying implementation. The scenario considers an online travel broker that, starting from specific requirements of customers, looks for hotel rooms and flights. Customers communicate their preferences to the broker and this, after some preliminary assessments, forwards the requirements to a number of hotels and air companies. Those, upon request, declare their availability and prices so that the customers can take the final choices and proceed with the booking.

The paper ends with a summary of distinguishing features of the presented languages and with a few considerations about the lessons learnt (Section 6).

2 Klaim: Kernel language for Agents Interaction and Mobility

Network awareness indicates the ability of the software components of a distributed application to manage directly a sufficient amount of knowledge about the network environment where they are currently deployed. This capability allows components to have a highly dynamic behaviour and manage unpredictable changes of the network environment over time. This is of great importance when programming mobile components capable of disconnecting from one node of the underlying infrastructure and of reconnecting to a different node. Programmers are usually supported with primitive constructs that enable components to communicate, and to distribute and retrieve data to and from the nodes of the underlying infrastructure.

Klaim (Kernel Language for Agents Interaction and Mobility, Klaim98 ) has been specifically devised to design distributed applications consisting of several components, both stationary and mobile, deployed over the nodes of a distributed infrastructure. The Klaim programming model relies on a unique interface (i.e. set of operations) supporting component communications and data management.

Localities are the basic building blocks of Klaim for guaranteeing network awareness. They are symbolic addresses (i.e. network references) of nodes and are referred by means of identifiers. Localities can be exchanged among the computational components and are subjected to sophisticated scoping rules. They provide the naming mechanism to identify network resources and to represent the notion of administrative domain: computations at a given locality are under the control of a specific authority. This way, localities naturally support the programming of spatially distributed applications.

Klaim builds on Linda’s notion of generative communication through a single shared tuple space Gel85 and generalises it to multiple distributed tuple spaces. A tuple space is a multiset of tuples. Tuples are anonymous sequences of data items and are retrieved from tuple spaces by means of an associative selection. Interprocess communication occurs through asynchronous exchange of tuples via tuple spaces: there is no need for producers (i.e. senders) and consumers (i.e. receivers) of a tuple to synchronise.

The obtained communication model has a number of properties that make it appealing for distributed computing in general (see, e.g., Gel89 ; davies97limbo ; CCR96 ; deugo01 ). It supports time uncoupling (data life time is independent of the producer process life time), destination uncoupling (data producers do not need to know the future use or the final destination of the data) and space uncoupling (programmers need to know a single interface only to operate over the tuple spaces, regardless of the network node where the action will take place).

2.1 Syntax

The syntax of Klaim is presented in Table 1. We assume existence of two disjoint sets: the set of localities, ranged over by , and the set of locality variables, ranged over by , with the distinguished variable denoting the locality of the node using it. Their union gives the set of names, ranged over by . We also assume three other disjoint sets: a set of value variables, ranged over by , a set of process variables, ranged over by , and a set of process identifiers, ranged over by .

  Nets: ::= (computational node) (located tuple) (net composition) Processes: ::= (inert process) (action prefixing) (choice) (parallel composition) (process variable) (process invocation) Actions: ::= (output) (input) (read) (migration) (creation) Tuples: Tuple fields: Evaluated tuples: Evaluated tuple fields: Templates: Template fields: Expressions:  

Table 1: Klaim syntax

Nets are finite collections of nodes where processes and data can be placed. A computational node takes the form , where is an allocation environment and is a process. Since processes may refer to locality variables, the allocation environment acts as a name solver binding locality variables to specific localities.

Processes are the active computational units of Klaim. Each process is obtained by composing subprocesses or the inert process via action prefixing (), nondeterministic choice (), parallel composition (), process variable (), and parameterised process invocation (). Recursive behaviours are modelled via process definitions; it is assumed that each identifier has a single defining equation . Lists of actual and formal parameters are denoted by and , respectively.

The tuple space of a node consists of all the evaluated tuples located there. Tuples are sequences of actual fields, i.e. expressions, localities or locality variables, or processes. The precise syntax of expressions is deliberately not specified; it is just assumed that they contain, at least, basic values, ranged over by , and value variables, ranged over by . Templates are sequences of actual and formal fields, and are used as patterns to select tuples in a tuple space. Formal fields are identified by the -tag (e.g. ) and are used to bind variables to values.

2.2 Informal semantics

Nets aggregate nodes through the composition operator , which is both commutative and associative. Processes are concurrently executed in an interleaving fashion, either at the same computational node or at different nodes. They can perform operations borrowed from a unique interface which provides two categories of actions. The first one consists of the programming abstractions supporting data management. Three primitive behaviours are provided: adding (out), withdrawing (in) and reading (read) a tuple to/from a tuple space. Input and output actions are mutators: their execution modifies the tuple space. The read action is an observer: it checks the availability and takes note of the content of a certain tuple without removing it from the tuple space. The second category of actions refers to network awareness: the migration action (eval) activates a new process over a network node, while the creation action (newloc) generates a new network node. The latter action is the only one not indexed by a locality because it acts locally; all the other actions are tagged with the (possibly remote) locality where they will take place. Note that, in principle, each network node can provide its own implementation of the action interface. This feature can be suitably exploited to sustain different policies for data handling as done, e.g., in MetaKlaim metaklaim .

Only evaluated tuples can be added to a tuple space and templates must be evaluated before they can be used for retrieving tuples. Tuple and template evaluation amounts to computing the values of expressions and using the local allocation environment as a name solver for mapping locality variables to localities. As a consequence, the locality variables within processes in a tuple are mapped to localities by using the local allocation environment. Localities and formal fields are left unchanged by such evaluation. A pattern-matching mechanism is then used for associatively selecting (evaluated) tuples from tuple spaces according to (evaluated) templates.

Process variables support higher-order communication, namely the capability to exchange (the code of) a process and possibly execute it. This is realised by first adding a tuple containing the process to a tuple space and then retrieving/withdrawing this tuple while binding the process to a process variable.

Finally, Klaim offers two forms of process mobility. One is based on static scoping: by exploiting higher-order communication, a process moves along the nodes of a net with a fixed binding of resources determined by the allocation environments of the nodes from where, from time to time, it is going to move. The other form of mobility relies on dynamic scoping: when migrating, a process breaks the local links to resources and inherits those of the destination node.

2.3 A taste of the operational semantics

The operational semantics is only defined for well-formed nets and it is given in terms of a structural congruence and a reduction relation over nets. A net is deemed well-formed if for each node we have that and , and for any pair of nodes and , we have that implies . Notation denotes the set of locality variables mapped by the allocation environment , while denotes the set of free variables of process . A variable is free in if it is not bound and it is bound in if it occurs within a formal field of or , or is the argument of ; the scope of the binding is the process after the prefix. Actions out and eval are not binders, but their arguments may contain variables. For the sake of simplicity, we assume that, for the processes we consider, bound variables are all distinct and different from the free ones.

The structural congruence, , identifies syntactically different nets that intuitively represent the same net. It is defined as the smallest congruence relation over nets that satisfies a given set of laws. The most significant law is meaning that it is always possible to transform a parallel of co-located processes into a parallel over nodes. The remaining laws express that   is commutative and associative,  the inert process can always be safely removed/added, and  a process identifier can be replaced with the body of its definition.

The reduction relations exploits two functions: one for evaluating tuples and templates, the other for selecting tuples in a tuple space. The evaluation function for tuples and templates takes as parameter the allocation environment of the node where the evaluation takes place. The main clauses of its definition are given below:


where denotes the process term obtained from by replacing any free occurrence of a locality variable that is not within the argument of an eval with . Two examples of process evaluation are and . We shall write to denote that evaluation of tuple using succeeds and returns the evaluated tuple .

For selecting an evaluated tuple from a tuple space according to an evaluated template , the pattern-matching function, , is used. This function is defined by means of a set of inference rules which intuitively state that: an evaluated template matches against an evaluated tuple if both have the same number of fields and corresponding fields do match; two values match only if they are identical, while formal fields match any value of the same type. A successful matching returns a substitution function associating the variables contained in the formal fields of the template with the values contained in the corresponding actual fields of the accessed tuple.

The reduction relation, , is defined as the least relation induced by a given set of inference rules. It is defined over configurations of the form , where is a finite set of localities keeping track of the localities occurring free in (that is, ). is needed to ensure global freshness of new (dynamically generated) network localities and is indeed omitted whenever a reduction does not generate any fresh locality.

The most significant rules are reported in Table 2, where we write to denote that either or is a locality variable that maps to .

Table 2: Klaim operational semantics

In rule (Out), the local allocation environment is used both to determine the name of the node where the tuple must be placed and to evaluate the argument tuple. This implies that if the argument tuple contains a field with a process , the corresponding field of the evaluated tuple contains the process resulting from the evaluation of its locality variables, that is . Hence, processes in a tuple are transmitted after the interpretation of their free locality variables through the local allocation environment. This corresponds to having a static scoping discipline for the (possibly remote) generation of tuples. (Out) requires existence of the target node at , which is left unchanged by the reduction, and that the tuple argument of is evaluable. As a result of the reduction, the tuple resulting from the evaluation of is added to the tuple space at . A dynamic linking strategy is adopted for the operation, rule (Eval). In this case the locality variables of the spawned process are not interpreted using the local allocation environment: the linking of locality variables is done at the remote node. The underlying assumption that all the equations for process definitions are available everywhere greatly simplifies rule (Eval), because it permits avoiding mechanisms for code inspection to find the process definitions needed by . Rule (In) requires that the template argument of is evaluable and that a matching tuple at the target node exists. As a result of the reduction, the matched tuple is removed from the target tuple space and the substitution returned by the pattern-matching function is applied to the continuation of the process performing the action, in order to replace the free occurrences of the variables bound by with the corresponding values of . Rule (Read) is similar, it only differs from (In) just because the accessed tuple is still left in the tuple space. Finally, in rule (New), the premise exploits the set to choose a fresh locality for naming the new node. In the continuation of the process performing the action the locality variable argument of is replaced by , thus the new locality becomes usable for the process. Notably, is not yet known to any other node in the net. Hence, it can be used by the creating process as a private name. The allocation environment of the new node is derived from that of the creating one with the obvious update for the location variable . Therefore, the new node inherits all the bindings of the creating node.

2.4 A travel booking scenario

We illustrate some of the distinguishing features of the Klaim programming model by using the online travel booking scenario informally presented in the Introduction. The Klaim specification consists of a collection of Klaim nodes, each modelling a component of the software architecture of the scenario. For simplicity, we focus on three main components:

  • the broker component where customers enter their requests, including the date and the origin-destination of the travel;

  • the hotel and flight components that are in charge of selecting hotels and flights in compliance with customers’ requests.

The UML activity diagram displayed in Fig. 1 illustrates the flow of control inside the Klaim net implementing the travel booking scenario. The broker, after collecting requests from customers, exploits code mobility to activate some spider processes in the hotel and flight nodes. These spider processes act on behalf of the broker to find hotels and flights matching customer’s request. The exploitation of code mobility within the workflow of the application is expressed in the diagram by the (resp. ) label. Once available, the result of the search carried out by the spider processes is communicated back to the broker.

Figure 1: Travel Booking Scenario in Klaim: Sequence Diagram.

When we presented the Klaim programming model, we mainly focused on the linguistic primitives to structure distributed applications and to program behaviour. Indeed, we have deliberately not considered primitive data types. We now show how to equip Klaim with simple data types. As an example, we introduce a data type for handling non-empty sequences of locations. We represent them through (the standard) square brackets comma-separated value notation: . The unary function takes as input a location and yields as result the sequence consisting of the argument location only: . The binary function (read cons) takes as input a location and a sequence, and produces as result a new sequence whose first element is the argument location: . Sequences now appear in tuples, hence pattern-matching has to be extended accordingly. For instance, the template matches all sequences consisting of one location only. The template matches all sequences having at least two elements. Hereafter, we will apply pattern-matching to values in order to recognise the form of values and let the computation be guided accordingly. The same approach can be followed to include other data types, such as Strings, Dates, and so on.

The structure of the nodes where the hotel and flight facilities are deployed is intuitively clear. The node hosting the hotel booking facility is presented below:

The node hosts the process , which manages the hotel booking requests activated by the broker component. Moreover, it exposes room availability through suitable tuples stored in the local tuple space .

The Klaim node that specifies the behaviour of the broker component is as follows:

The node hosts the processes and presented below, together with the local tuple space, represented by . For the sake of readability, we exploit a sort of macro-like mechanism to associate a name to a piece of Klaim specification, e.g. we write to indicate that the code of the process will replace the identifier each time this is encountered in the Klaim specification.

The handler process receives the customer request, obtained by sensing in the tuple space the tuple containing the data about the customer code, the date-origin-destination of the travel, and the location of the node where the results of the request will be stored. The handler process then activates the manager process , by emitting in the local tuple space a tuple tagged by , and gets the actual session identifier, by inspecting the tuple space. We abstract from the detailed description of process , since it deals with some low-level computational aspects specific for the considered application, rather than taking care of coordinating activities. We just assume that the omitted code creates the unique session identifier for the customer’s request, and associates to it the sequence of hotels and the sequence of airline companies that must be queried to satisfy the customer’s request.

The handler process exploits two recursive processes to activate the spider (mobile) processes in charge of finding hotels and flights.

The spider processes take fully advantage of Klaim dynamic linking mobility through the primitive. This ensures that each spider will be spawned on the remote node without evaluating its locality variables according to the allocation environment of the broker component. This programming choice implies that when the mobile code will run in the remote node of the hotel (resp. of the flight), the location will be bound to the actual address of the location where the hotel (resp. flight) component is deployed. The result of this search is then forwarded back to the tuple space of the broker.

The last part of the behaviour of the broker consists in the management of the customer’s preferences. This strongly depends on the data type used to store preferences. We outline the abstract specification of the facility matching customer’s preferences with respect to the hotel information, as the treatment of flight information is similar. For simplicity, we assume that the hotel information is stored in tuples of the form . We also assume that the customer has a loyalty card for a specific group of hotels which is stored in the tuple space of the broker and that the customer will add a distinguished tuple to the tuple space of the broker to signal the termination of the hotel booking activity.

The process senses the local tuple space of the broker to identify the information about the hotels made available by the spider processes. This information is checked against the customer’s preference (i.e., the hotel group) in order to report the presence of a reduced rate. Whenever information on the hotel meets the customer’s preferences, the tuple containing the hotel data is stored in the remote tuple space of the customer with a flag indicating the availability of the reduced rate.


The Klaim primitive constructs for code mobility are instrumental to support the workflow of the travel booking scenario. We have seen that the spider processes exploit dynamic linking mobility. This has the additional benefit that the preferences associated with the specific customer are confined to the location of the broker. This is a simple way to obtain a suitable form of data privacy. More sophisticated forms of security could be obtained through the use of Klaim types for access control or hierarchical Klaim nets. We refer to DFPV00 and to H-KLAIM for details.

2.5 Programming environment

X-Klaim 222X-Klaim is available online at https://github.com/LorenzoBettini/xklaim. (eXtended Klaim, BDFP98 ) is an experimental programming language that extends Klaim with a high level syntax for processes. It provides variable declarations, enriched operations, assignments, conditionals, sequential and iterative process composition. The implementation of X-Klaim is based on Klava 333Klava is available online at http://music.dsi.unifi.it. (Klaim in Java, BDP02 ), a Java package that provides the run-time system for X-Klaim operations, and on a compiler, which translates X-Klaim programs into Java programs that use Klava. A renewed and enhanced version of X-Klaim is proposed in NEW_XKLAIM . The new implementation comes together with an Eclipse-based IDE tooling, and relies on recent powerful frameworks for the development of programming languages, in particular the Xtext framework XtextBook .

X-Klaim can be used to write the higher layer of distributed applications while Klava can be seen both as a middleware for X-Klaim programs and as a Java framework for programming according to the Klaim paradigm. By using Klava directly, the programmer is able to implement a finer grained type of mobility.

Fig. 2 lists a significant fragment of code444The X-Klaim source code for the complete scenario can be downloaded from https://bitbucket.org/tiezzi/jlamp_survey_code/src/master/Klaim/. of the X-Klaim implementation of the Klaim specification of the travel booking scenario, presented in Section 2.4. The X-Klaim code permits appreciating how close a Klaim specification is to its X-Klaim implementation. Indeed, the syntax of the communication primitives is the same, except for the notation of formal fields that in X-Klaim are specified as (typed) variable declarations. Notably, concurrent subprocesses of the process, composed by means of the operator in Klaim, are activated in X-Klaim using the eval action with target . Finally, in the definition of the network, physical localities are expressed in terms of the standard TCP syntax host:port.

c HandlerProc() {
    while (true) {
        in(var String usr, var String date, var String origin, var String dest,
            val PhysicalLocality res)@self
        in(usr,var Integer sid)@self
        eval({eval(new SpiderHotelProc(sid,date))@self
                 read(usr,"h",var String hpref)@self
                 eval(new ManageHotelPrefProc(usr,sid,res,hpref))@self})@self
        eval({eval(new SpiderFlightProc(sid,date,origin,dest))@self
                 read(usr,"f",var String fpref)@self
                 eval(new ManageFlightPrefProc(usr,sid,res,fpref))@self})@self
net TravelBookingNet physical "localhost:9999" {
    node Customer logical "customer" {
        eval(new CustomerProc)@self
    node Broker logical "broker"{
        …initialize broker’s tuple space…
        eval(new HandlerProc)@self
        eval(new SessionManagerProc)@self
    node Hotel1 logical "hotel1"{
        eval(new HotelManagerProc("hotel1","hotelGroupA"))@self
    . . .
    node Flight1 logical "flight1"{
        eval(new FlightManagerProc("flight1","flightGroupA"))@self
    . . .
Figure 2: The process and the network of the travel booking scenario implemented in X-Klaim.

2.6 Verification techniques

Many verification techniques have been defined for Klaim and variants thereof. Here we only mention a few of them. In DL02:MLMA a temporal logic is proposed for specifying and verifying dynamic properties of mobile processes specified in Klaim. The inspiration for the proposal was the Hennessy-Milner Logic, but it needed significant adaptations due to the richer operating context of components. The resulting logic provides tools for establishing not only deadlock freedom, liveness and correctness with respect to given specifications (which are crucial properties for process calculi and similar formalisms), but also properties that relate to resource allocation, resource access and information disclosure (which are important issues for processes involving different actors and authorities).

An important topic deeply investigated for Klaim is the use of type systems for security DFPV00 ; DGP06 ; GP09 , devoted to control accesses to tuple spaces and mobility of processes. In these type systems, traditional types are generalised to behavioural types. These are abstractions of process behaviours that provide information about processes capabilities, namely the operations that processes can execute at a specific locality (downloading/consuming a tuple, producing a tuple, activating a process, and creating a new node). When using behavioural types, each Klaim node is equipped with a security policy, determined by a net coordinator, that specifies the execution privileges; the policy of a node describes the actions processes there located can execute. By exploiting static and dynamic checks, type checking guarantees that only processes whose intentions match the rights granted to them by coordinators are allowed to proceed. An expressive language extension, called MetaKlaim, equipped with a powerful type system is described in metaklaim . MetaKlaim is a higher order distributed process calculus equipped with staging mechanisms. It integrates MetaML (an extension of SML for multi-stage programming) and Klaim, to permit interleaving of meta-programming activities (such as assembly and linking of code fragments), dynamic checking of security policies at administrative boundaries, and traditional computational activities on a wide area network (such as remote communication and code mobility). MetaKlaim exploits a powerful type system (including polymorphic types à la system F) to deal with highly parameterised mobile components and to dynamically enforce security policies: types are metadata that are extracted from code at run-time and are used to express trustiness guarantees. The dynamic type checking ensures that the trustiness guarantees of wide area network applications are maintained also when computations interoperate with potentially untrusted components.

An alternative approach to control accesses to tuple spaces and mobility of processes is introduced in DGHNNPP10 . It is based on Flow Logic and permits statically checking absence of violations. Starting from an existing type system for Klaim with some dynamic checks, the insights from the Flow Logic approach are exploited to construct a type system for statically guaranteeing secure access to tuple spaces and safe process migration for a smooth extension of Klaim. This is the first completely static type system for controlling accesses devised for a tuple space-based coordination language. A static control flow analysis that extends the one proposed in LMCS17 ; icissp2019 , to manage network awareness and coordination via multiple tuple spaces has been introduced in BDFG19 . The static analysis can be used to detect where and how tuples are manipulated and how messages flow among the nodes of a Klaim network. This permits to identify possible security breaches in the data workflow of a distributed application. For instance, it may keep the safe paths that data inside a tuple can traverse apart from those that pass through a possible untrusted node.

We now outline how the static methodology presented in BDFG19 can be applied to investigate the security of Klaim code. We illustrate this by resorting to the Klaim specification of the travel booking scenario. The static methodology enables us to construct an abstract graph-based model of the behaviour of the Klaim specification of the scenario. This abstract model supports a reasoning technique which permits to detect () the path in the network through which (a value in) a tuple of a specific node reaches another one, and () the transformations which are applied to a selected value along those paths.

In the travel booking scenario, the abstract model approximates the trajectories of each piece of data. For instance, the abstract trajectory below

expresses the path of the value associated to the customer’s request. This trajectory, made of pairs of the form separated by the symbol ‘:’, encodes the data transformations generated by each of the involved components in processing the customer’s request together with the sequence of locations traversed due to the computation steps.

The abstract path above describes the capacity of the Klaim code to correctly manage customer’s request. Instead, the following abstract path detects a suspicious trajectory, namely a trajectory that by-passes the phase where the results of the spider processes are collected together. More generally, by analysing the abstract paths derived from the model it is possible to identify crucial code structures. We refer to BDFG19 ; BDFG20 for more details.

2.7 Related work

Especially at the beginning of this century, with the manifest pervasivity of the Internet, many researchers have considered both models and implementations of network-aware formalisms that have or have been influenced by the work on Klaim and other Linda-based models and primitives. In BDM18 , many implementations of Linda-based models, including Klaim-based ones, to coordinate the interactions among system components are described and their efficiency is assessed. Instead, Ciatto0LOZ18 is a recent survey of coordination techniques for distributed and mobile systems, including those based on Linda and those relying on different coordination models. For references on network-aware programming and relation with Klaim we refer the interested reader to Klaim98 ; Global04 .

Among the foundational calculi aiming at capturing the key notions of network-awareness and identifying the programming abstractions most suitable for network-aware programming, we would like to mention three different ones, namely the Distributed -calculus (DHR02 , the Distributed Join Calculus (DJoin) FGLMR96 ; FG02 , the Ambient Calculus (Amb) CG00 , that were essentially proposed at the same time as Klaim.

is a variant of the -calculus enriched with explicit locations that are used to distribute processes. Interprocess communication is binary, channel-based, synchronous and local, in the sense that only processes at the same location can exchange messages. A process willing to communicate with a remote one has first to migrate to its location.

In DJoin, a location is structured as a tree composed by the root location and its sub-locations. When a process defined at a specific location moves to a different location, the whole tree moves along with the process. Again, process communication is channel-based and there is a unique process that can receive on each channel. To synchronise, processes rely on so-called join patterns that may require pattern matching on data and simultaneous reception of messages on different channels.

Finally, in the Ambient calculus, the key notion is that of ambient that can be thought of as a bounded environment where processes cooperate. An ambient is characterised by a name, a collection of local agents and a collection of sub-ambients, and can be referred only through explicit naming. An agent moves together with the ambient containing it. Communication is local to ambients and takes place through anonymous message exchange, without resorting to channels or pattern matching.

3 Cows: Calculus for Orchestration of Web Services

Since the early 2000s, the increasing success of e-business, e-learning, e-government, and other similar systems, has led the World Wide Web, initially thought of as a system for human use, to evolve towards an architecture for Service-Oriented Computing (SOC) supporting automated use. The SOC paradigm, that finds its origin in object-oriented and component-based software development, aims at enabling developers to build networks of distributed, interoperable and collaborative applications, regardless of the platform where the applications run and of the programming language used to develop them. The paradigm is based on the use of independent computational units, called services. They are loosely coupled reusable components, that are built with little or no knowledge about clients and about other services involved in their operating environment.

One successful instantiation of the general SOC paradigm is given by the Web Service technology WS_W3C , which exploits the pervasiveness of the Internet and related standards. Traditional software engineering technologies, however, do not neatly fit with SOC, thus hindering its full realisation in practice. The challenges come from the necessity of dealing at once with such issues as asynchronous interactions, concurrent activities, workflow coordination, business transactions, resource usage, and security, in a setting where demands and guarantees can be very different for the many involved components.

Cows (Calculus for Orchestration of Web Services, LPT07:ESOP ; COWS_JAL ) is a formalism whose design has been influenced by the OASIS standard WS-BPEL WSBPEL for orchestration of web services. In Cows, services are computational entities capable of generating multiple instances to concurrently handle different client requests. Inter-service communication occurs through communication endpoints and relies on pattern-matching for logically correlating messages to form an interaction session by means of their identical contents. Differently from most process calculi, and from Klaim, receive activities in Cows bind neither names nor variables, and this is crucial for allowing concurrent service instances to share (part of) the state. The calculus also supports service fault and termination handling by providing activities to force termination of labelled service instances and to protect service activities from a forced termination.

3.1 Syntax

  Services: ::=  (invoke)  (kill)  (receive-guarded choice)  (parallel composition)  (protection)  (delimitation)  (replication) Receive-guarded choice: ::=  (nil)  (request processing)  (choice)  

Table 3: Cows syntax

The syntax of Cows is presented in Table 3. We use three countable disjoint sets: the set of values (ranged over by ), the set of ‘write once’ variables (ranged over by ), and set of killer labels (ranged over by ). The set of values is left unspecified; however, we assume that it includes the set of partner and operation names (ranged over by , , ) mainly used to represent communication endpoints. We also use a set of expressions (ranged over by ), whose exact syntax is deliberately omitted; we just assume that expressions contain values and variables, and do not contain killer labels. As a matter of notation, ranges over values and variables, ranges over names and variables, and ranges over elements, i.e. killer labels, names and variables. Notation stands for tuples, e.g.  means (with ), where variables in the same tuple are all distinct.

Services are structured activities built from basic activities, i.e. the empty activity , the invoke activity , the receive activity  , and the kill activity , by means of prefixing  , choice  , parallel composition  , protection  , delimitation and replication  . We write to assign a name to the term .

3.2 Informal semantics

Invoke and receive are the communication activities. The former permits invoking an operation (i.e., a functionality like a method in object-oriented programming) offered by a service, while the latter permits waiting for an invocation to arrive. Besides output and input parameters, both activities indicate an endpoint through which communication should occur.

An endpoint can be interpreted as a specific implementation of operation provided by the service identified by the logic name . The names composing an endpoint can be dealt with separately, as in an asynchronous request-response interaction, where usually the service provider statically knows the name of the operation for sending the response, but not the partner name of the requesting service it has to reply to. Partner and operation names can be exchanged in communication, thus enabling many different interaction patterns among service instances. However, dynamically received names cannot form the endpoints used to receive further invocations (as in localised -calculus LOCALISEDPI ). In other words, endpoints of receive activities are identified statically because the syntax only allows using names and not variables for them. This design choice reflects the current (web) service technologies that require endpoints of receive activities to be statically determined.

An invoke can proceed as soon as all expression arguments are successfully evaluated. A receive offers an invocable operation along with a given partner name , thereafter the service continues as . An inter-service communication between these two activities takes place when the tuple of values , resulting from the evaluation of the invoke argument, matches the template argument of the receive. This causes a substitution of the variables in the receive template (within the scope of variables declarations) with the corresponding values produced by the invoke.

Communication is asynchronous, as in Klaim. This results from the syntactic constraints that invoke activities cannot be used as prefixes and choice can only be guarded by receive activities (as in asynchronous -calculus ACS98 ). Indeed, in service-oriented systems, communication is usually asynchronous, in the sense that (i) there may be an arbitrary delay between the sending and the receiving of a message, (ii) the order in which messages are received may differ from that in which they were sent, and (iii) a sender cannot determine if and when a sent message will be received.

The empty activity does nothing, while choice permits selecting for execution one between two alternative receives.

Execution of parallel services is interleaved. However, if more matching receives are ready to process a given invoke, only one of the receives that generate a substitution with smallest size (in terms of number of variable-value replacements) is allowed to progress (namely, execution of this receive takes precedence over that of the others). This mechanism permits to model the precedence of a service instance over the corresponding service specification when both of them can process the same request, and enables a sort of blind-date conversation joining strategy SOCA .

Delimitation is the only binding construct: binds the element in the scope . According to its first argument, delimitation is used for three different purposes: (i) to regulate the range of application of substitutions produced by communication, when the delimited element is a variable; (ii) to generate fresh names, when the delimited element is a name; (iii) to confine the effect of a kill activity, when the delimited element is a killer label. The scope of names can be dynamically extended, in order to model the communication of private names, as done with the restriction operator in -calculus PICALC . Instead, killer labels cannot be dynamically extended, because the activities whose termination would be forced by the execution of a kill need to be statically determined.

The kill activity forces immediate termination of all the concurrent activities not enclosed within the protection operator. To faithfully model fault and termination handling of SOC applications, kill activities are executed eagerly with respect to the communication activities enclosed within the delimitation of the corresponding killer label.

Finally, the replication construct permits to spawn in parallel as many copies of as necessary. This, for example, is exploited to implement recursive behaviours and to model business process definitions, which can create multiple instances to serve several requests simultaneously.

3.3 A taste of the operational semantics

Table 4: Cows operational semantics (selected rules)

The operational semantics of Cows is defined only for closed services, i.e. services without free variables and killer labels. As usual, the semantics is formally given in terms of a structural congruence and of a labelled transition relation. The former identifies syntactically different services that intuitively represent the same service. Its definition is standard, except for the scope extension laws that permit to extend the scope of names (as in the -calculus) and variables, thus enabling possible communication, but prevent extending the scope of killer labels.

We report in Table 4 an excerpt of the operational rules defining the labelled transition relation. We comment on the rules below.

A service invocation can proceed only if the expressions in the argument can be evaluated (rule (inv)). To this aim, we use the evaluation function that takes a closed expression and returns the corresponding value. This function is not explicitly defined, since the exact syntax of expressions is deliberately not specified. A receive activity offers an invocable operation along a given partner name (rule (rec)). Communication can take place when two parallel services perform matching receive and invoke activities (rule (com)). We use here the partial function for performing pattern-matching on semi-structured data (à la Klaim). Pattern-matching permits to determine if a receive and an invoke over the same endpoint can synchronise. When tuples and do match, returns a substitution for the variables in ; otherwise, it is undefined. Substitutions are functions mapping variables to values and are written as collections of pairs of the form . Application of substitution to , written , has the effect of replacing every free occurrence of in with , for each . The label of a communication transition indicates the generated substitution (for subsequent application), rather than a silent action as in most process calculi. When the delimitation of a variable argument of a receive involved in a communication is encountered, i.e. the whole scope of the variable is determined, the delimitation is removed and the substitution for is applied to the term (rule (del)). Variable disappears from the term and cannot be reassigned a value (for this reason Cows’s variables are deemed ‘write once’). We use to denote the union of substitutions and when they have disjoint domains.

Execution of parallel services is interleaved but, if more matching receives are ready to process a given invoke, only one of the receives that generate a substitution with smallest size (in terms of number of variable-value replacements) is allowed to progress (namely, execution of this receive takes precedence over that of the others). This mechanism permits to model the precedence of a service instance over the corresponding service specification when both of them can process the same request (we refer to (COWS_JAL, , Sec. 3.2) for a complete account on this feature), and enables a sort of blind-date conversation joining strategy SOCA . For the sake of presentation, we have omitted here this precedence mechanism, thus presenting a simplified version of the operational rules concerning the parallel composition operator.

Activity forces termination of all unprotected parallel activities (rules (kill) and (par)) inside the innermost enclosing . Termination of a service is achieved by means of function , which returns the service obtained by only retaining the protected activities inside . The delimitation stops the killing effect by turning the transition label into (rule (del)). Such delimitation, whose existence is ensured by the assumption that the semantics is only defined for closed services, prevents a single service to be capable to stop all the other parallel services, which would be unreasonable in a service-oriented setting (as services are loosely coupled and organized in different administrative domains). Critical activities can be protected from killing by putting them into a protection ; this way, behaves like (rule (prot)). Similarly, behaves like (rule (del)), except when the transition label contains , in which case must correspond either to a communication assigning a value to (rule (del)) or to a kill activity for (rule (del)), or when a free kill activity for is active in , in which case only actions corresponding to kill activities can be executed. Predicate is used to check the absence of a free kill activity: it holds true if either is not a killer label, or and cannot immediately perform a free kill activity . In this way, kill activities are executed eagerly with respect to the activities enclosed within the delimitation of the corresponding killer label.

3.4 A travel booking scenario

We provide here, in an incremental way, the Cows specification of our travel brokering scenario.

At a high level of abstraction, the travel broker service is rendered in Cows as:

The replication operator is used here to specify that the service is persistent, i.e. capable of creating multiple instances to serve several requests simultaneously. The delimitation operator specifies the scope of the variables arguments of the subsequent receive activity on operation , used to receive a request message from a customer. Besides dates and destination of the travel, this message contains the partner name that the customer will use to receive the response, which will be sent by the service by means of the invoke activity on operation . Booking of hotel and flight is here abstracted by the (unspecified) expression .

A customer of the broker service is specified as follows:

The customer behaviour is specular to that of the broker: it starts with an invoke and then waits for a response message containing the travel data.

The overall specification of the scenario is simply the parallel composition of the two components: . Whenever prompted by a client request, the broker service creates an instance to serve that specific request, and is immediately ready to concurrently serve other possible requests. Therefore, the resulting Cows term after such a computational step is the following:

The created service instance (highlighted by a grey background) is represented as a service running in parallel with the other terms. Notably, the variables of the invoke activity are instantiated (i.e., replaced) by the corresponding values exchanged in the communication. This invoke activity can now synchronise with the receive activity of the customer, whose execution will then continue as with replaced by the value resulting from the evaluation of the expression.

Let us now consider a more refined specification, where the role of the expression is played by the interactions with services for flights and hotels searching.

Figure 3: Travel Booking Scenario in Cows: Sequence Diagram.

The interactions between a customer, the (refined) broker and the searching services are described by the UML activity diagram in Fig. 3. The figure highlights that the broker service interacts in parallel with the flights and hotels searching services, and that it replies to the customer after both parallel interactions complete. The refined specification of the broker is the following:

After the reception of a customer request, the service contacts in parallel the two searching services (by invoking the operation ). When the responses from both services are available, the broker service combines them and replies to the customer. To this aim, a private endpoint is exploited: the reception of a message from a searching service triggers an signal (i.e., an internal message) along the private endpoint, and two of such signals are necessary to trigger the invoke activity for replying to the customer. Suitable expression functions could be used in this last invoke activity for filtering the results produced by the searching services. Notice that the scope of variable (resp. ) includes not only the continuation (resp. ) of the service performing the receive, but also the activity for sending the response to the customer. This is different from most process calculi and accounts for easily expressing variables shared among parallel activities within the same service instance, which is a feature typically supported in SOC.

The behaviour of the above service is of particular interest when it is included in a scenario with multiple customers (the specifications of customers and searching services are omitted, we just assume that they follow the communication protocol established by the broker specification):

After a certain number of computational steps have taken place, we can obtain a system configuration where one instance of the broker service is created per each customer, and both instances have sent their requests to the searching services and are waiting for replies. Now, to send the values resulting from the processing of the request of the first customer, the flight searching service has to perform an invoke activity of the form . However, the broker service has two instances waiting for such message along the endpoint . In order to deliver the message to the proper instance, i.e. the one serving the request of the first customer, the message correlation mechanism is used. In fact, in SOC, it is up to each single message to provide a form of context that enables services to associate the message with the appropriate instance. This is achieved by embedding values, called correlation data, in the message itself. Pattern-matching is the mechanism used by the Cows’s semantics for locating correlation data. In our example, these data are the customer’s partner name, the travel dates and the destination, which have instantiated the corresponding variables in the receive activity within the broker instance serving . While the receive of the instance serving the first customer is enabled, the one within the other broker instance is not, as it has been instantiated with unmatchable values.

Finally, let us provide further details of the broker specification, in order to add fault and compensation handling activities (highlighted by a grey background):

Now, when a positive response from a searching service is received, a compensation handler is installed. This consists of an invoke activity on operation , triggered by a signal, devoted to cancel the booking. If a negative response on (resp. ) is received, the normal execution of the service is immediately terminated (by means of the activity), the activity compensating the hotel (resp. flight) booking is activated, if installed, and a signal is emitted. This last signal triggers the execution of the fault handler, consisting of an invoke activity for notifying the customer that the request booking is failed. Notably, fault and compensation activities are enclosed within protection blocks, in order to protect them from the killing effect of the activities.


Most of the distinguishing features of Cows find their full application in the final specification of the travel booking scenario. Let us focus on the service. The replication operator is used to allow the broker service to create multiple instances. In particular, an instance is created for each received customer request. Pattern-matching (on the correlation values replacing variables , , and ) is then used to associate each message from the searching services to the appropriate broker instance. The delimitation operator is used for different purposes: to define the scope of the correlation variables; to make the endpoint private; to share variables and among the parallel terms within the scope of the inner delimitation operator, and to limit the scope of the actions. The protection operator, instead, is used to protect the fault and compensation handlers from the killing effect.

3.5 Programming environment

To effectively program SOC applications, Cows, originally conceived as a process calculus, has been extended with high-level features, such as standard control flow constructs (i.e., sequentialisation, assignment, conditional choice, iteration) and a scope activity explicitly defining fault and compensation handlers. The implementation of the resulting orchestration language, called B LapadulaPT12 , is based on a software tool ACR supporting a rapid and easy development of SOC applications via the translation of service orchestrations written in B into executable WS-BPEL programs. More specifically, a B program given as input to this tool also includes a declarative part, containing the variable types and the physical service bindings, necessary for generating the corresponding WSDL document and the process deployment descriptor. These files, together with the one containing the WS-BPEL code, are organised in a package that can be deployed and executed in a WS-BPEL engine.

In Fig. 4 we report the relevant code555The B source code for the complete scenario can be downloaded from https://bitbucket.org/tiezzi/jlamp_survey_code/src/master/COWS/blite_code/. of the B implementation of the Cows specification, presented in Section 3.4, of the travel booking scenario. Despite the use of a different notation, the invoke (inv) and receive (rcv) primitives of B acts similarly to the Cows’ ones. To ease the programming task, B also provides the high-level features for sequential (seq _ ; …; _ qes), and parallel (flw _ | …| _ wlf) composition. These permit avoiding the interactions along the private endpoint . The last line of the listing shows a deployment definition, which associates the correlation set {x_cust, x_dates, x_dest} to the broker service. The declarative part of this B program, specifying the configuration data necessary to produce the corresponding WS-BPEL program, is omitted.