That's Enough: Asynchrony with Standard Choreography Primitives

11/23/2017
by   Luís Cruz-Filipe, et al.
SDU
0

Choreographies are widely used for the specification of concurrent and distributed software architectures. Since asynchronous communications are ubiquitous in real-world systems, previous works have proposed different approaches for the formal modelling of asynchrony in choreographies. Such approaches typically rely on ad-hoc syntactic terms or semantics for capturing the concept of messages in transit, yielding different formalisms that have to be studied separately. In this work, we take a different approach, and show that such extensions are not needed to reason about asynchronous communications in choreographies. Rather, we demonstrate how a standard choreography calculus already has all the needed expressive power to encode messages in transit (and thus asynchronous communications) through the primitives of process spawning and name mobility. The practical consequence of our results is that we can reason about real-world systems within a choreography formalism that is simpler than those hitherto proposed.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

11/30/2017

On Asynchrony and Choreographies

Choreographic Programming is a paradigm for the development of concurren...
04/24/2018

Connectors meet Choreographies

We present Cho-Reo-graphies (CR), a new language model that unites two p...
10/09/2017

Threefold Analysis of Distributed Systems: IMDS, Petri Net and Distributed Automata DA3

Integrated Model of Distributed Systems is used for specification and ve...
07/21/2018

An Asynchronous soundness theorem for concurrent separation logic

Concurrent separation logic (CSL) is a specification logic for concurren...
02/26/2018

On the Validity of Encodings of the Synchronous in the Asynchronous π-calculus

Process calculi may be compared in their expressive power by means of en...
07/04/2012

Asynchronous Dynamic Bayesian Networks

Systems such as sensor networks and teams of autonomous robots consist o...
10/22/2020

Automata and Fixpoints for Asynchronous Hyperproperties

Hyperproperties have received increasing attention in the last decade du...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Today, concurrent and distributed systems are widespread. Multi-core hardware and large-scale networks represent the norm rather than the exception. However, programming such systems is challenging, because it is difficult to program correctly the intended interactions among components executed concurrently (e.g., services). Empirical investigations of bugs in concurrent and distributed software [LLLG16, LPSZ08] reveal that most errors are due to: deadlocks (e.g., a component that was supposed to be ready for interaction at a given time is actually not); violations of atomicity intentions (e.g., a component is performing some action when not intended to); or, violations of ordering intentions (some components perform the right actions, but not when intended). If the design and implementation of a concurrent system are initially difficult, they get even harder as the system evolves and has to be maintained. Without proper tool support, introducing new actions at components may have unexpected effects due to side-effects.

To mitigate this problem, choreographies can be used as high-level formal specifications of the intended interactions among components [BB11, BPMN, BGGLZ06, CHY12, HYC16, LGMZ08, QZCY07, WSCDL].

Example 1.

We use a choreography to define a scenario where a buyer, Alice (), purchases a product from a seller () through her bank ().

In Line 1, the term denotes an interaction whereby communicates the title of the book that Alice wishes to buy to . The seller then sends the price of the book to both and . In Line 4, sends the price she expects to pay to , which confirms that it is the same amount requested by (stored internally at ). If so, notifies both and of the successful transaction (Line 5) and sends the book to (Line 6). Otherwise, notifies and of the failure (Line 7) and the choreography terminates.

Choreographies are the foundations of an emerging development paradigm, called Choreographic Programming [M13:phd, M15], where an automatic projection procedure is used to synthesise a set of compliant local implementations (the implementations of the single components) from a choreography [CHY12, LGMZ08, QZCY07]. This procedure is formally proven to be correct, preventing deadlocks, ordering errors, and atomicity violations. This ensures, critically, that updates to either the choreography or the local implementations do not introduce bugs and that developers always know what communications their systems will enact (by looking at the choreography). In the previous example, the implementation inferred for, e.g., Alice (), would be: send the book title to ; receive the price from ; send the price to for confirmation; await the success/failure notification from ; in case of success, receive the book from .

Choreography languages come in all sizes and flavours, with different sets of primitives inspired by practical applications, such as adaptation [PGGLM15, PGLMG14], channel mobility [CM13, chor:website], or web services [BPMN, CHY12, WSCDL]. However, this multiplicity makes it increasingly difficult to reuse available theory and tools, because of the differences and redundancies among these models. For this reason, we previously introduced the model of Core Choreographies (CC) [ourstuff], a minimal and representative theoretical model of Choreographic Programming. In CC, components are modelled as concurrent processes that run independently and possess own memory, inspired by process calculi [SW01]. Example 1 is written in the syntax of CC described in § 2.

In this paper, we are interested in studying asynchronous communications in choreographies. As a motivation, consider the two communications in Lines 2 and 3 of Example 1: typically, in a realistic system, we would expect to send the price to and then immediately proceed to sending it also to , without waiting for to receive its message. Typically, asynchronous communications are formalised in choreography models by defining ad-hoc extensions to their syntax and semantics [CM13, DY13, HYC16, LGMZ08, MY13, MYH09], causing a substantial amount of duplication in their technical developments (many of which are even incompatible with each other).

Unfortunately, there are still no foundational studies that provide an elegant and general understanding of asynchrony in choreographies. Here, we pursue such a study in the context of CC. We depict our overall development in Figure 1, and describe it in the following.

Figure 1: Choreography calculi and encodings.

We first present our development for the computational fragment of CC, called Minimal Choreographies (MC) [ourstuff]. We take inspiration from how asynchrony is modelled in foundational process models, specifically the -calculus [MPW92]. The key idea there is to use processes to represent messages in transit, allowing the sender to proceed immediately after having sent a message without having to synchronise with the receiver [SW01]. In an asynchronous system, there is no bound to the number of messages that could be transiting in the network; this means that MC is not powerful enough for our purposes, because it can only capture a finite number of processes (the same holds for CC). For this reason, we extend MC with two standard notions, borrowed from process calculi and previous choreography models: process spawning – the ability to create new processes at runtime – and name mobility – the ability to send process references, or names. We call this new language Dynamic Minimal Choreographies (DMC). MC is a strict sub-language of DMC, denoted by the arrow on the left-hand side of Figure 1. In general, all arrows of shape in that figure denote (strict) language inclusion.

The dotted arrow () in Figure 1 is the cornerstone of our development: every choreography in MC can be encoded in an asynchronous implementation in DMC, by using auxiliary processes to represent messages in transit. Since DMC extends MC with new primitives, it makes sense to extend this encoding to the whole language of DMC (). This syntactic interpretation of asynchrony in choreographies is our main contribution. Specifically, our results show that asynchronous communications can be modelled in choreographies using well-known notions, i.e., process spawning and name mobility (studied, e.g., in [CM13, ourPCstuff]), without the need for ad-hoc constructions. Coming back to the title: we already have enough.

The fact that our encoding can be extended from MC to DMC is evidence that our approach is robust, and the simplicity of DMC makes it a convenient foundational calculus to use in future developments of choreographies. However, one of the expected advantages of using a foundational theory such as DMC for capturing asynchrony is indeed that we can reuse existing formal techniques based on standard primitives for choreographies. (This is a common scenario in -calculus, where many techniques apply to its sub-languages [SW01].) We show an example of such reuses. Core Choreographies (CC) [ourstuff] is MC with the addition of a primitive for communicating choices explicitly as messages, called selection [CHY12, HVK98, HYC16, YHNN13] (the terms in Lines 5 and 7 in Example 1 are selections). An important property of CC is that selections can be encoded in the simpler language MC – the dashed arrow () in Figure 1. What happens if we add selections to DMC? Ideally, the resulting calculus (called Dynamic Core Choreographies, or DCC) should both have an asynchronous interpretation through the techniques introduced in this paper and still possess the property that selections are encodable using the simpler language DMC. This is indeed the case. We extend our encoding to yield an interpretation of asynchronous selections, yielding () and (). The second property (encodability of selections in DCC) follows immediately from language inclusion, giving us () for free.

2 Background

We briefly introduce CC and MC, from [ourstuff], and summarise their key properties.

The syntax of CC is given in Figure 2, where ranges over choreographies.

Figure 2: Core Choreographies, Syntax.

Processes () run in parallel, and each process stores a value in a local memory cell.111In the original presentation, values were restricted to natural numbers; we drop this restriction here since it is orthogonal to our development. Each process can access its own value using the syntactic construct , but it cannot read the contents of another process. Term is an interaction between two processes, read “the system may execute and proceed as ”. In a value communication , sends its local evaluation of expression to , which stores the received value. In a label selection , communicates label to . The set of labels is immaterial, as long as it contains at least two elements. In a conditional , sends its value to , which checks if the received value is equal to its own; the choreography proceeds as , if that is the case, or as , otherwise. In all these actions, the two interacting processes must be different. Definitions and invocations of recursive procedures () are standard. The term is the terminated choreography.

The semantics of CC uses reductions of the form , where the total state function maps each process name to its value. We use , , to range over values. The reduction relation is defined by the rules given in Figure 2.

Figure 3: Core Choreographies, Semantics.

These rules formalise the intuition presented earlier. In the premise of , we write for the result of replacing with in . In the reductum, denotes the updated state function where now maps to .

Rule uses the structural precongruence relation , which gives a concurrent interpretation to choreographies by allowing non-interfering actions to be executed in any order. The key rule defining is

where stands for and and returns the set of all process names occurring in . The other rules for are standard, and support recursion unfolding and garbage collection of unused definitions.

CC was designed as a core choreography language, in which in particular it is possible to implement any computable function. Furthermore, CC choreographies can always progress until they terminate.

Theorem 1.

If is a choreography, then either ( has terminated) or, for all , for some and ( can reduce).

Label selections are not required for Turing completeness, and thus the simpler fragment MC obtained from CC by omitting them is interesting as an intermediate language for compilers and, also, for theoretical analysis. One of the reasons for having label selection is to make choice propagation explicit in choreographies; in a system implementation, this allows, e.g., to monitor distributed choices without having to inspect the message payload. Another reason is projectability: the possibility of automatically generating processes implementations that satisfy the choreographic specification. In Example 1, the label selections in Lines 5 and 7 are important in order for to let and know whether or not they should communicate.

Choices communicated by label selections can also be encoded as data in value communications, by sending a boolean value to determine which one of two branches was selected. This is the key idea behind the encoding presented in [ourstuff] – arrow () in Figure 1 – which transforms a choreography in CC to one in MC by encoding selections as value communications and nested conditionals.

We do not need to concern ourselves with projectability in this work, and we will thus omit its details. This is because CC and MC enjoy a projectability property that is not altered by our development. Formally, there exists a procedure that, given any choreography, returns a choreography in CC that is projectable. Then, given a projectable CC choreography, the encoding transforms it into a choreography in MC, by encoding selections as value communications and conditionals. These transformations preserve the computational meaning of choreographies, as formally stated in the following theorem ( extends a state function to the auxiliary processes introduced by the transformations in a systematic way).

Theorem 2.
Let be MC choreographies and be states. If , then .

The main limitation of CC is that its semantics is synchronous. Indeed, in a real-world scenario implementation of Example 1, we would expect to proceed immediately to sending its message in Line 3 after having sent the one in Line 2, without waiting for to receive the latter. Capturing this kind of asynchronous behaviour is the main objective of our development in the remainder of this paper.

3 Asynchrony in MC

In this section, we extend CC with primitives to implement asynchronous communication, obtaining a calculus of Dynamic Core Choreographies (DCC). We focus on MC and first show that any MC choreography can be encoded in DMC – the fragment of DCC that does not use label selection – in such a way that communication becomes asynchronous.

More precisely, we provide a mapping such that every communication action becomes split into a send/receive pair in , with the properties that: can continue executing without waiting for to receive its message (and even send further messages to ); and messages from to are delivered in the same order as they were originally sent.

The system DCC. We briefly motivate DCC. In CC, there is a bound on the number of values that can be stored at any given time by the system: since each process can hold a single value, the maximum number of values the system can know is equal to the number of processes in the choreography, which is fixed. However, in an asynchronous setting, the number of values that need to be stored is unbounded: a process may loop forever sending values to , and may wait an arbitrary long time before receiving any of them. Therefore, we need to extend CC with the capability to generate new processes. As discussed in [ourPCstuff], this requires enriching the language with two additional abilities: parameters to recursive procedures (in order to be able to use a potentially unbounded number of processes at the same time) and action to communicate process names.

Formally, the differences between the syntax of CC and that of DCC are highlighted in Figure 4: procedure definitions and calls now have parameters; there is a new term for generating processes; and, the expressions sent by processes can also be process names.

Figure 4: Dynamic Core Choreographies, Syntax.

The possibility of communicating a process name () ensures name mobility. We will use the abbreviation as shorthand for .

The semantics for DCC includes an additional ingredient, borrowed from [ourPCstuff]: a graph of connections , keeping track of which pairs of processes are allowed to communicate. This graph is directed, and an edge from to in (written ) means that knows the name of . In order for an actual message to flow between and , both processes need to know each other, which we write as .222In some process calculi, the weaker condition is typically sufficient for to send a message to . Our condition is equivalent to that found in the standard model of Multiparty Session Types [HYC16]. This choice is orthogonal to our development. The reduction relation now has the form , where and are the connection graphs before and after executing , respectively. The complete rules are given in Figure LABEL:fig:dcc_semantics, with defined similarly to CC. In rule , the fresh process is assigned a default value .

3 Asynchrony in MC

In this section, we extend CC with primitives to implement asynchronous communication, obtaining a calculus of Dynamic Core Choreographies (DCC). We focus on MC and first show that any MC choreography can be encoded in DMC – the fragment of DCC that does not use label selection – in such a way that communication becomes asynchronous.

More precisely, we provide a mapping such that every communication action becomes split into a send/receive pair in , with the properties that: can continue executing without waiting for to receive its message (and even send further messages to ); and messages from to are delivered in the same order as they were originally sent.

The system DCC. We briefly motivate DCC. In CC, there is a bound on the number of values that can be stored at any given time by the system: since each process can hold a single value, the maximum number of values the system can know is equal to the number of processes in the choreography, which is fixed. However, in an asynchronous setting, the number of values that need to be stored is unbounded: a process may loop forever sending values to , and may wait an arbitrary long time before receiving any of them. Therefore, we need to extend CC with the capability to generate new processes. As discussed in [ourPCstuff], this requires enriching the language with two additional abilities: parameters to recursive procedures (in order to be able to use a potentially unbounded number of processes at the same time) and action to communicate process names.

Formally, the differences between the syntax of CC and that of DCC are highlighted in Figure 4: procedure definitions and calls now have parameters; there is a new term for generating processes; and, the expressions sent by processes can also be process names.

Figure 4: Dynamic Core Choreographies, Syntax.

The possibility of communicating a process name () ensures name mobility. We will use the abbreviation as shorthand for .

The semantics for DCC includes an additional ingredient, borrowed from [ourPCstuff]: a graph of connections , keeping track of which pairs of processes are allowed to communicate. This graph is directed, and an edge from to in (written ) means that knows the name of . In order for an actual message to flow between and , both processes need to know each other, which we write as .222In some process calculi, the weaker condition is typically sufficient for to send a message to . Our condition is equivalent to that found in the standard model of Multiparty Session Types [HYC16]. This choice is orthogonal to our development. The reduction relation now has the form , where and are the connection graphs before and after executing , respectively. The complete rules are given in Figure LABEL:fig:dcc_semantics, with defined similarly to CC. In rule , the fresh process is assigned a default value .