On Distributed Runtime Verification by Aggregate Computing

Runtime verification is a computing analysis paradigm based on observing a system at runtime (to check its expected behaviour) by means of monitors generated from formal specifications. Distributed runtime verification is runtime verification in connection with distributed systems: it comprises both monitoring of distributed systems and using distributed systems for monitoring. Aggregate computing is a programming paradigm based on a reference computing machine that is the aggregate collection of devices that cooperatively carry out a computational process: the details of behaviour, position and number of devices are largely abstracted away, to be replaced with a space-filling computational environment. In this position paper we argue, by means of simple examples, that aggregate computing is particularly well suited for implementing distributed monitors. Our aim is to foster further research on how to generate aggregate computing monitors from suitable formal specifications.



There are no comments yet.


page 1

page 2

page 3

page 4


Who is to Blame? Runtime Verification of Distributed Objects with Active Monitors

Since distributed software systems are ubiquitous, their correct functio...

A Survey of Challenges for Runtime Verification from Advanced Application Domains (Beyond Software)

Runtime verification is an area of formal methods that studies the dynam...

Stability and Resilience of Distributed Information Spreading in Aggregate Computing

Spreading information through a network of devices is a core activity fo...

Towards Partial Monitoring: It is Always too Soon to Give Up

Runtime Verification is a lightweight formal verification technique. It ...

Decentralized Runtime Verification for LTL Properties Using Global Clock

Runtime verification is the process of verifying critical behavioral pro...

Computation Against a Neighbour

Recent works in contexts like the Internet of Things (IoT) and large-sca...

Multi-Scale Verification of Distributed Synchronisation

Algorithms for the synchronisation of clocks across networks are both co...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Runtime verification is a computing analysis paradigm based on observing a system at runtime (to check its expected behaviour) by means of monitors generated from formal specifications. Distributed runtime verification is runtime verification in connection with distributed systems: it comprises both monitoring of distributed systems and using distributed systems for monitoring. Being a verification technique, additionally, runtime verification promotes the generation of monitors from formal specifications, so as to precisely state the properties to check as well as providing formal guarantees about the results of monitoring. Distribution is hence a particularly challenging context in verification, for it requires to correctly deal with aspects such as synchronisation, faults in communications, possible lack of unique global time, and so on. Additionally, the distributed system whose behaviour is to be verified at runtime could emerge from modern application scenarios like the Internet-of-Things (IoT), Cyber-Physical Systems (CPS), or large-scale Wireless Sensor Networks (WSN). In this case additional features are to be considered, like openness (the set of nodes is dynamic), large-scale (a monitoring strategy may need to scale from few units up to thousands of devices), and interaction locality (nodes may be able to communicate only with a small neighbourhood, though the property to verify is global). So, in the most general case, distributed runtime verification challenges the way in which one can express properties on such dynamic distributed systems, can express flexible computational tasks, and can reason about compliance of properties and corresponding monitoring behaviour.

In this paper, we argue that a promising approach to address these challenges can be rooted on the computational paradigm of aggregate computing [BPV-COMPUTER2015], along with the field calculus language [DVB-SCP2016]. Aggregate computing promotes a view of distributed systems as a conceptually single computing device, spread throughout the (physical or virtual) space in which nodes are deployed. At the paradigm level, hence, this view promotes the specification (construction, reasoning, programming) of global-level computational behaviour, where the interaction of individuals are essentially abstracted away. At the modelling level, the field calculus can be leveraged, which expresses computations as transformations of computational fields (or fields of short), namely, space-time distributed data structures mapping computational events (occurring at a given position of space and time) to computational values. As an example, a set of temperature sensors spread over a building forms a field of temperature values (a field of reals), and a monitor alerting areas where the temperature was above a threshold for the last 10 minutes is a function from the temperature field to a field of Booleans. Field calculus has a working implementation called ScaFi in the Scala programming language [CV-PLMDC2016], where field computations can be expressed by Scala functions (relying on a suitable API) and actors are generated to realise the distributed system.

The remainder of this paper is organized as follows: Section 2 provides the necessary background; Section 3 presents the field calculus; Section 4 discusses monitoring general distributed programs through field calculus; Section 5 illustrates how field calculus programs can be instrumented with monitors; and Section 6 concludes.

2 Background

In this section we provide the necessary background on distributed runtime verification and aggregate computing by briefly discussing the literature outlining their role.

2.1 Motivating examples

We motivate the usage of aggregate computing together with runtime verification techniques through two examples that have been thoroughly studied in earlier literature: a crowd-evacuation scenario and a general communication channel.

In the first scenario, a program (not necessarily written through aggregate computing techniques) is used to manage evacuation of agents from a given area in case of an emergency. Ideally, in such critical situations correctness guarantees for a particular solution and its implementation would be needed. Since the guarantees that can be proved are usually not fully satisfactory, they can be fruitfully complemented with runtime monitors. As an example, we focus here on a simple “per-agent” property that we could monitor: two neighbour agents (closer than 5m) should not have “evacuation vectors” that lead to a direct collision with each other

(for both agents, the evacuation vector is within 60° from the direction of the other agent). We can then instantiate an aggregate computing monitor on each agent, observing the local and neighbours’ evacuation vectors, and flagging violations of this property as they occur.

The second scenario describes an aggregate computing solution to the well-known problem of establishing a shortest communication path between two nodes, while ensuring reliability through an imposed width (cross-section size of the channel), which provides the desired redundancy and alternative routes. In this situation, we define a more interesting property that is not per-agent, but rather per-network, and also shows how the monitor can feedback into the original program: for each cross-section, we require that it has at least min width of alternative connections. If not, we demand the channel program to increase the width. Conversely, if all slices contain more than max nodes, we shrink the channel to save computational power.

2.2 Distributed runtime verification

Runtime verification is a lightweight verification technique concerned with observing the execution of a system with respect to a specification [DBLP:journals/jlp/LeuckerS09]. Specifications are generally trace- or stream-based, with events that are mapped to atomic propositions in the underlying logic of the specification language. Popular specification languages include variations on the Linear Temporal Logic (LTL), and regular expressions, which can be effectively checked through finite automata constructions. Events may be generated through state changes or execution flow, such as method calls.

In distributed runtime verification, we lift this concept to distributed systems, where we find applications in the following areas [Francalanza2018]: (i) observing distributed computations and expressiveness (specifications over the distributed systems), (ii) analysis decomposition (coupled composition of system- and monitoring components), (iii) exploiting parallelism (in the evaluation of monitors), (iv) fault tolerance and (v) efficiency gains (by optimising communication). In Sections 4 and 5, we show how runtime verification can be applied to, or contribute to some of those areas.

Naturally, such lifting also affects the specification language. Bauer and Falcone [DBLP:journals/fmsd/0002F16] show a decentralised monitoring approach where disjoint atomic propositions in a global LTL property are monitored without a central observer in their respective components. Communication overhead is shown to be lower than the number of messages that would need to be sent to a central observer.

Sen et al. introduce PT-DTL [SenPTDLTL2004] to specify distributed properties in a past time temporal logic. Sub-formulas in a specification are explicitly annotated with the node (or process) where the sub-formula should be evaluated. Communication of results of sub-computation is handled by message passing.

Both approaches assume a total communication topology, i.e., each node can send messages to everyone in the system, although causally unrelated messages may arrive in arbitrary order.

Going beyond linear-time properties, hyperproperties over a set of traces allow a richer expressivity [DBLP:conf/rv/FinkbeinerHST17]. In our setting, as each node is running the same program, we can understand such a set as consisting of traces from the individual nodes. Further issues on (efficient) monitorabilty have been addressed by Aceto et al. in [Aceto2018Monitorability].

2.3 Aggregate computing

The problem of finding suitable programming models for ensemble of devices has been the subject of intensive research—see e.g. the surveys [SpatialIGI2013, Viroli-Et-al:COORDINATION-2018]: works as TOTA [tota] and Hood [hood] provide abstractions over the single device to facilitate construction of macro-level systems; GPL [coorephd] and others are used to express spatial and geometric patterns; Regiment [regiment] and TinyLime [Curino05mobiledata] are information systems used to stream and summarise information over space-time regions; while MGS [GiavittoMGS05] and the fixpoint approach in [DBLP:journals/corr/Lluch-LafuenteL16] provide general purpose space-time computing models. Aggregate computing and the field calculus have then be developed as a generalisation of the above approaches, with the goal of defining a programming model with sufficient expressiveness to describe complex distributed processes by a functional-oriented compositional model, whose semantics is defined in terms of gossip-like computational processes.

Hence, aggregate computing [BPV-COMPUTER2015] aims at supporting reusability and composability of collective adaptive behaviour as inherent properties. Following the inspiration of “fields” of physics (e.g., gravitational fields), this is achieved by the notion of computational field (simply called field) [tota], defined as a global data structure mapping devices of the distributed system to computational values. Computing with fields means deriving in a computable way an output field from a set of input fields. This can be done at a low-level, by defining programming language constructs or general-purpose building blocks of reusable behaviour, or at a high-level by designing collective adaptive services or whole distributed applications—which ultimately work by getting input fields from sensors and process them to produce output fields to actuators.

The field calculus [DVB-SCP2016, viroli:selfstabilisation] is a minimal functional language that identifies basic constructs to manipulate fields, and whose operational semantics can act as blueprint for developing toolchains to design and deploy systems of possibly myriad devices interacting via proximity-based broadcasts. Recent works have also adopted this field calculus as a lingua franca to investigate formal properties of resiliency to environment changes [DiGamma, viroli:selfstabilisation], and to device distribution [BVPD-TAAS2017].

2.4 Deployment

A number of techniques exist to deploy runtime verification as part of or in parallel to an application to be subjected to runtime verification. A high-level technique to monitor a JVM-based application is the use of aspect-oriented programming [StolzB06], which allows for an easy integration in terms of events: this method allows to easily intercept actions of the main application and use them as input events for the step-wise evaluation of properties. In addition, this approach can be used to inspect or sample the current state of the systems. This does not necessarily have to mean that the runtime verification algorithm is executed in the context of an application, but this event-generation can also be used to generate stimuli to external runtime verification engines that are implemented for example with the help of rewriting logic.

In the setting of field calculus programs, such an integration is more straight-forward: here, we do not need to establish a coupling between a target application and a runtime verification framework, but rather have FC programs that implement runtime verification monitors along side applications written in that formalism. As such, they use the same communication constructs to aggregate information from neighbours and trigger local actions.

As in the more traditional RV approaches for main-stream languages and systems, also here one can separate the implementation language from the specification language. We take a first step and show how common safety properties can be expressed as field calculus programs. Ideally, one would next strive for a specification language that resembles more a temporal logic with future or past operators, which is then translated into a field calculus program to monitor the property.

Taken as an approach to distributed runtime verification, we note that the field calculus also brings infrastructure that tackle a challenge in truly distributed systems: the dynamic nature of these systems with their varying number of participants and communication topology poses the challenge of reliability. So far, the RV community has mostly considered systems with a fixed number of agents and a fixed topology where communication is either point-to-point, allowing for interesting schemes to convey partial information, or broadcast, where message loss is not taken into account. See Basin et al.’s work [BasinKZ15] for a rare take on distributed runtime verification in the presence of communication delays and errors.

In field calculus, on the one hand one faces the same challenges, e.g. of establishing a global property across all agents. On the other hand, the constructs and mechanism of the field calculus, do provide a solution in themselves and do not require another level of middleware: a developer that is already familiar with the field calculus will naturally encode e.g. properties of resilience and awareness of network partitions into their specifications.

3 The field calculus

The field calculus [DVB-SCP2016] is a minimal language to express aggregate computations over distributed networks of (mobile) devices, each asynchronously capable of performing simple local computations and interacting with a neighbourhood by local exchanges of messages. Field calculus provides the necessary mechanism to express and compose such distributed computations, by a level of abstraction that intentionally neglects explicitly management of synchronisation, message exchanges between devices, position and quantity of devices, and so on; while retaining Turing-universality for distributed computations [a:fcuniversality].

3.1 The model of computation

In field calculus, a program is periodically and asynchronously executed on every device, according to the following cyclic schedule. The involved device , every period :

  1. perceives contextual information, which is formed by: data provided by sensors, local information stored in the previous round, and messages collected from neighbours while sleeping,111Older messages may be retained until a certain timeout expires, or newer messages are received. the latter in the form of a neighbouring value —essentially a map from neighbours to values ;

  2. evaluates the program , considering as input the contextual information gathered as described above;

  3. the result of this computation is a data structure that is stored locally, broadcast to neighbours, and possibly fed to actuators;

  4. sleeps until it is awaken at the next activation.

By repetitive execution of such computation rounds, across space (where devices are located) and time (when devices fire), a global behaviour emerges [viroli:selfstabilisation], which can be fruitfully considered as occurring on the overall network of interconnected devices, modelled as a single aggregate machine equipped with a neighbouring-based topology relation. This process can be mathematically modelled through the notion of event, which correspond to the instants when devices are activated and start this sequence (see [a:fcuniversality, a:rtssgradient] for further details on events and their role in modelling distributed computations).

Definition 1 (event [a:rtssgradient]).

An event is modelled by the pair such that is the identifier of the device where the event takes place, and is the time when the device is activated. The time stamp refers to the local clock of .

Events are partially ordered by the following relationship.

Definition 2 (direct predecessor [a:rtssgradient]).

An event is a direct predecessor (or neighbour for short) of an event , denoted by , if the message broadcast by was the last from able to reach before occurred (and was not discarded by as an obsolete message).

It follows that if is a neighbour of , then has to happen right before , but not too long time ago (otherwise the message would have been discarded) or too far away (otherwise the message would not be received): thus, the neighbouring relation typically reflects spatial proximity. However, it could also be a logical relationship (e.g., connecting master devices to slave devices independently of their position), in which case the “far away” requirement would be measured through the logical network topology.

Furthermore, notice that the relation on events forms a direct acyclic graph (DAG) among events, since cycles would correspond to a closed timelike curve. Hence, the relation is time-driven and anti-symmetric, unlike spatial-only neighbouring (which is usually symmetrical).

Figure 1: Representation of a field evolution of integers together with its underlying event structure (neighbouring). Past events of the circled blue event are depicted in red, future events in green, concurrent events in black. This field evolution models the computation in each event of the longest preceding chain of events, obtainable locally by taking the maximum of the neighbour counters increased by .
Definition 3 (causality [a:rtssgradient]).

The causality partial order on events is the transitive closure222Thus, iff there exists a (possibly empty) sequence of events such that . of .

The causality relation defines which events constitute the past, future or are concurrent to any given event. A set of events together with a neighbouring relation determines an event structure, represented in Figure 1. Notice that we do not assume that a global clock is available, nor that the scheduling of events follow a particularly regular pattern. This choice is dictated by the need to apply the field calculus to the broadest possible class of problems, as further restrictions can always be added without hassle.

Using the formalism of event structures, we can abstract the data manipulated by a field calculus program as a whole distributed space-time field evolution , mapping individual events in an event structure to data values (see Figure 1). Similarly, we can understand an “aggregate computing machine” as a device manipulating these field evolutions, and abstract is as a function mapping input field evolutions to output field evolutions.

3.2 The programming language

The syntax of field calculus is presented in Figure 2—the overbar notation is a shorthand for sequences of elements, and multiple overbars are intended to be expanded together, e.g., stands for and for . The keywords and correspond to the two peculiar constructs of field calculus, responsible of interaction and field dynamics, respectively; while and correspond to the standard function definition and the branching expression constructs.

Figure 2: Syntax of the field calculus language.

A program is the declaration of a set of functions of the kind “”, and a main expression that is the one executed at each computation round, as well as the one considered (in the global viewpoint) as the overall field computation. An expressions can be:

  • A variable , used e.g. as formal parameter of functions.

  • A value , which can be of the following two kinds:

    • A local value , with structure or simply when is empty (defined via data constructor and arguments ), can be, e.g., a Boolean (@ifdisplaystyleTrue or @ifdisplaystyleFalse), a number, a string, or a structured value (e.g., a pair @ifdisplaystylePair(True,5)).

    • A neighbouring (field) value that associates neighbour devices to local values , e.g., it could be the neighbouring value of distances of neighbours—note that neighbouring field values are not part of the surface syntax, they are produced at runtime by evaluating expressions, as described below.

  • A function call , where can be of two kinds: a user-declared function (declared by the keyword , as illustrated above) or a built-in function , such as a mathematical or logical operator, a data structure operation, or a function returning the value of a sensor.

  • A branching expression , used to split field computation in two isolated sub-networks, where/when evaluates to @ifdisplaystyleTrue or @ifdisplaystyleFalse: the result is computation of in the former area, and in the latter.

  • An -expression , use to create a neighbouring field value mapping neighbours to their latest available result of evaluating . In particular, each device :

    1. broadcasts (together with its state information) its value of to its neighbours,

    2. evaluates the expression into a neighbouring field value associating to each neighbour of the latest evaluation of at .

    Note that the the evaluation by a device of an -expression within a branch of some expressions, is affected only by the neighbours of that, during their last computation cycle, evaluated the same value for .

  • A -expression models evolution through time, by returning the value of the expression where each occurrence of is replaced by the value of the -expression at the previous computation cycle—or by if the -expression has not been evaluated in the previous computation cycle.

The meaning of a field calculus program can be defined through a denotational and an operational semantics, both thoroughly studied in [Viroli:HFC-TOCL]. The denotational semantics maps an expression to a field evolution on a given event structure (see Section 3.1), and is compositional meaning that .

Alternatively, an operational semantics of rounds in a network can be given in terms of a transition system between network configurations , where act is either env to model any environment change, or a device identifier to represent a device computation. These computations are in turn modelled by a local judgement to be read as “expression evaluates to on device with respect to sensor values and neighbours’ data ”, where is the structure of values obtained from the evaluation of every sub-expression of , and is a map from neighbour device identifiers to the last which was received from them by the current device.

Example 1 (hop-count distance).

In order to give an intuition of the behaviour of a field calculus program, consider the following function, where @ifdisplaystyleminHood selects the minimum element in the range of a numeric field , and @ifdisplaystylemux is a classic “multiplexer” operator selecting its second or third argument depending on the truth value of the first (overloaded to apply pointwise on fields).

def hopcount(source) {
  rep (infinity) { (c) => mux(source, 0, minHood(nbr{c+1})) }

The @ifdisplaystylehopcount functions computes the number of hops required to reach a node where @ifdisplaystylesource is true: it is zero in sources, and equal to the minimum count of a neighbour incremented by one in non-source nodes.

Remark 2 (sample code).

In practical implementations of the field calculus, the language is often extended to include additional features improving code readability. In Section 5 we shall use some of them, in particular:

  • The traditional @ifdisplaystylelet x = e_1 in e_2 construct, which can be thought as a shorthand for the expression @ifdisplaystylef(e_1, y_1,...,y_n) given the definition @ifdisplaystyledef f(x, y_1,...,y_n) {e_2}, where @ifdisplaystyley_1, …, @ifdisplaystyley_n are the variables occurring free in @ifdisplaystylee_2.

  • The notation @ifdisplaystyle[e_1,...,e_n], representing tuple creation @ifdisplaystyleTuple(e_1,...,e_n).

  • The multi-valued construct @ifdisplaystylerep (v_1,...,v_n) {(x_1,...,x_n) => e_1,...,e_n}, as a shorthand for the following.

    rep ([v_1,...,v_n]) { (t) =>
      let x_1 = 1st(t) in ... let x_n = nth(t) in [e_1,...,e_n]

4 Implementing monitors in field calculus

Inspired by Francalanza et al. [Francalanza2018], we frame our discussion by considering a distributed monitoring setting where:

  1. The system under analysis comprises a number of subsystems, identified by processes , that execute independently and might interact (i.e., synchronize or communicate) via the underlying communication platform.

  2. The set of processes is partitioned across locations , i.e., each process is located at exactly one location, denoted by . Two processes and are local to one other if and only if , and remote otherwise. Processes may interact with both local and remote processes (usually remote communication is more expensive than local communication). Notable cases are when one of the following two conditions holds:

    1. There is just one location (i.e., all the processes are local);

    2. At each location there is exactly one process (i.e., all processes are remote).

  3. Each location hosts a number of local traces , each trace consists of a total ordered set of events, and each event describes a discrete computational step of a process at the location that hosts the trace. A trace may contain events of different process. Notable case are when one or two of the following conditions holds:

    1. each trace contains events of a single process (i.e., each trace belongs to a single process), or

    2. for each process there is exactly one trace (containing the events of the process).

  4. Monitoring is performed by computation entities, identified by monitors , that check properties of the system under analysis by analysing the traces. Similar to processes each monitor is hosted at a given location and may communicate with other (local or remote) monitors. Notable cases are when there is exactly one monitor for each:

    1. location,

    2. process, or

    3. trace.

In runtime verification monitors are generated from formal specifications. In the following we illustrate, by means of simple examples, how the field calculus can be used to implement distributed monitors. Our aim is to pave the way towards generating field calculus distributed monitors from suitable formal specifications. We consider the following setting:

  • Each monitor is implemented by a field calculus program running on a dedicated (virtual or physical) device.

  • Each local trace is mapped to a sensor.

  • The construct comes in two forms:

    • nbrLocal, for communication with local devices (i.e., if is a neighbour of then they are at the same location), and

    • nbrRemote, for communication with remote devices (i.e., if is a neighbour of then they are at different locations).

  • Each device is awaken whenever:

    • a new event arrives on one of the sensors of the devices, or

    • a new different message arrives from a (local or remote) neighbour ;333Note that if the new message is equal to the last message received from then the device is not awaken.

    provided that a minimum time span has elapsed from the previous evaluation cycle.

Moreover, for simplicity, we also assume that both conditions 3.a and 3.b hold. In the next two subsections we present examples in the context of the “local monitor only” and of the “remote monitors only” assumptions, respectively.

4.1 Local monitors only

In this subsection we assume that condition 2.a (given at the beginning of Section 4) holds, that is, every process is local. We consider two smart home scenarios, in which processes are assumed to be local through either: (i) physically wired connections; or (ii) short-range efficient wireless communication, as the one expected by upcoming 5G standards. In this setting, the network topology can be full, that is, every node communicates with every other node.

In the first scenario, we want to monitor the following property: air conditioning and lights are on when the room is not empty. In order to express this property, we assume that the following 0-ary built-in operators (with corresponding traces) are given:

  • @ifdisplaystylelights: an optional Boolean value, which is true if the lights are on, false if they are off and null in nodes not controlling the lights.

  • @ifdisplaystylepeople: an optional Boolean value, depending on whether the node is sensing the presence of nearby people (if sensing is available).

This first property can then be expressed through the following program:

lights() == null || lights() == anyHood(nbrLocal{people() == true})

where @ifdisplaystyleanyHood is a built-in function that given a Boolean field , returns true if and only if at least one element in the range of is true. The monitoring property holds in nodes not controlling lights (i.e., when @ifdisplaystylelights is @ifdisplaystylenull), or when the lights are on if and only if @ifdisplaystylepeople is true in some communicating node, capturing the required idea.

In the second scenario, we want to monitor the following property: if the volume of the stereo is above a certain threshold, every node should rapidly agree on alerting the stereo to lower its volume. In order to express this property, we assume that the following 0-ary built-in operators (with corresponding traces) are given:

  • @ifdisplaystylelevel: the volume level of the stereo, or in nodes not controlling the stereo.

  • @ifdisplaystylealert: an optional Boolean value, depending on whether the node is sensing excessive noise, which is null if no sensing is available.

This second property can then be expressed through the following program:

def roundsince(condition) {
  rep (0) { (x) => if (condition) {0} {x+1} }
roundsince(allHood(nbrLocal{alert() != false}) || level() <= THRESHOLD) < DELAY

where @ifdisplaystyleTHRESHOLD, @ifdisplaystyleDELAY are given constants and @ifdisplaystyleallHood is a built-in function that given a Boolean field , returns true if and only if every element in the range of is true. Function @ifdisplaystyleroundsince counts the number of rounds elapsed since the last time @ifdisplaystylecondition was true. The monitoring property holds provided that no more than @ifdisplaystyleDELAY turns elapsed since when the volume was below @ifdisplaystyleTHRESHOLD or all nodes agreed on alerting.

4.2 Remote monitors only

In this subsection we assume that condition 2.b (given at the beginning of Section 4) holds, that is, every process is remote. In this case, it is no longer realistic to assume a full communication topology; instead, we shall have few neighbours for every node to reduce the number of needed communications. This may not make a difference in case the property to monitor is fully local, as by the first example discussed in Section 2.1 which may be written through the following specification, where @ifdisplaystylenbrVector is a returns the field of vectors to neighbours, @ifdisplaystyledirection is the quantity to be monitored and @ifdisplaystyleangle computes the relative angle between two vectors.

allHood(-60 < angle(nbrVector(), direction()) < 60 &&
        -60 < angle(-nbrVector(), nbr{direction()}) < 60)

However, when properties to monitor are not fully local, we may require a “data collection” routine to ensure effective spatial quantification (e.g., checking whether a property is true for all devices). This can be accomplished in field calculus through the collection building block, which is here instantiated for spatial quantification with the help of the result @ifdisplaystylecount

of the simple distance estimation routine

@ifdisplaystylehopcount described in Example 1.

def everywhere(property, count) {
  rep (false) { (p) =>
    allHood(mux(nbrRemote{count} > count, nbrRemote{p}, property))
} }
def somewhere(property, count) {
  rep (false) { (p) =>
    anyHood(mux(nbrRemote{count} > count, nbrRemote{p}, property))
} }

The @ifdisplaystyleeverywhere and @ifdisplaystylesomewhere functions check the validity of a property in nodes with a higher count, so that their value in the source should correspond to the intended result. More efficient collection [a:collection, viroli:selfstabilisation] and distance computation algorithms [a:ultgradient, a:scpgradient] may be used in practical systems to implement those same functions: in this paper, we opted for the simplest implementations instead for sake of readability.

With the help of those functions, we can translate both scenarios in Section 4.1 to a remote-only setting. For the first scenario, we may want to check that an electronic system is on when some people are present in a large building, which can be accomplished by the following code.

lights() == null || lights() == somewhere(people() == true, hopcount(lights()!= null))

For the second scenario, we may want to check that every area of such a building is alerted for evacuation after some dangerous event has been detected, which can be accomplished by the following code.

roundsince(everywhere(alert() != false, hopcount(level() != 0)) ||
           level() <= THRESHOLD) < DELAY

In both scenarios, we compute hop-count distances from controller nodes (which are reasonably unique), and use these distances to guide aggregation.

5 Monitoring field calculus programs

In case the distributed program to be monitored is a field calculus program, further opportunities arise from the ability of instrumenting the monitor code within the original algorithm, and possibly implementing feedback loops between them. Inspired by the second motivating example presented in Section 2.1, we consider the following @ifdisplaystylechannel routine building on the @ifdisplaystylehopcount function presented in Example 1.

def broadcast(value, count) {
  rep (value) { (oldval) =>
    mux( count == 0, value, 2nd(minHood(nbr{[count, oldval]})) )
} }
def elliptic-channel(sourcecount, destcount, width) {
  let sourcedest = broadcast(sourcecount, destcount) in
  sourcecount + destcount <= sourcedest + width
def channel(value, source, dest, width) {
  let sourcecount = hopcount(source) in
  let destcount   = hopcount(dest) in
  let inarea = elliptic-channel(sourcecount, destcount, width) in
  if (inarea) { broadcast(value, sourcecount) } { value }

The @ifdisplaystylebroadcast function spreads a @ifdisplaystylevalue from a source generating a certain hop-count distance (@ifdisplaystylecount) outwards: every device selects the provided value only if it is the source (@ifdisplaystylecount == ), otherwise it selects the value of the neighbour with minimal @ifdisplaystylecount. Function @ifdisplaystyleelliptic-channel defines a roughly elliptic area with foci in a source and destination and given @ifdisplaystylewidth, by comparing the sum of distances from the current location to the source and destination with the distance between the source and destination themselves (obtained by broadcasting from the destination the value of the distance to the source). Finally, function @ifdisplaystylechannel uses the above functions to broadcast a value in the area selected by @ifdisplaystyleelliptic-channel.

In order for the communication to be reliably performed, the @ifdisplaystylewidth parameter has to be carefully tuned, depending also on the network characteristics. Thus, it is crucial to monitor the effectiveness of the choice, as performed by the following functions, where @ifdisplaystylesumHood computes the sum of a numeric field , @ifdisplaystylemin computes the minimum between two numbers, and @ifdisplaystylemyID returns the identifier of the current device.

def samevalue(value, count) {
  let num,id = rep (1,myID()) { (num,id) =>
    sumHood(mux(nbr{id} == myID(),  num, 0))+1,
    2nd(minHood( mux(nbr{value} == value, nbr{[count,myID()]}, [infinity, myID()]) ))
  }) in
  broadcast(num, if (id == myID()) {0} {count})
def monitor(sourcecount, destcount, minw, maxw) {
  let w = min(samevalue(sourcecount,destcount), samevalue(destcount,sourcecount)) in
  if (w > maxw) {HIGH} {if (w < minw) {LOW} {OK}}

Function @ifdisplaystylesamevalue computes the number of devices holding the same value for @ifdisplaystylevalue in devices with the lowest possible @ifdisplaystylecount: every device collects partial estimates @ifdisplaystylenum from neighbours who selected it in @ifdisplaystyleid, and selects in @ifdisplaystyleid the neighbour with the same @ifdisplaystylevalue and lowest possible @ifdisplaystylecount. The @ifdisplaystylenum computed by the device with the lowest possible @ifdisplaystylecount is then broadcast to others (since devices with lowest possible @ifdisplaystylecount select themselves as @ifdisplaystyleid). The @ifdisplaystylemonitor then uses function @ifdisplaystylesamevalue to estimate the cross-section from both points of view of the source and destination, considering the minimum among them: a status is finally returned depending on whether this estimates fall above, below or within the required interval.

This monitor, if run within the area selected by @ifdisplaystyleelliptic-channel, can estimate whether the channel is properly established. Furthermore, it can be instrumented within the channel function to obtain an auto-adjusting channel as in the following.

def adjusting-channel(value, source, dest, minw, maxw) {
  let sourcecount = hopcount(source) in
  let destcount   = hopcount(dest) in
  let inarea = 1st(rep (False, maxw) { (oarea, owidth) =>
    let narea = elliptic-channel(sourcecount, destcount, owidth) in
    let status = if (narea) {monitor(narea, minw, maxw)} {OK} in
    narea, if (status == OK) {width} {if (status == LOW) {owidth+1} {owidth-1}}
  }) in
  if (inarea) { broadcast(value, sourcecount) } { value }

This function increases or decreases the width by 1 according to the status returned by the monitor. Furthermore, it does so independently in every device of the network, allowing the shape of the channel to adjust to the network local peculiarities (instead of the fixed elliptical shape of the traditional @ifdisplaystylechannel).

6 Conclusion

In this position paper we have illustrated, by means of simple examples, how the field calculus can be used to implement distributed monitors in different settings. In particular, we have provided examples of local and remote monitors, and an example of a field calculus program within which the monitor can be instrumented providing the algorithm with an additional auto-correcting power.

In future work we would like to investigate how field calculus expressions, e.g. using the -construct, could be used in conjunction with a specification language like LTL; and possibly be automatically generated by a logical language. This would allow us to write properties along the lines of “Eventually, all my neighbours…” or “Some neighbour will always …”.


This work has been partially supported by the European Union’s Horizon 2020 research and innovation programme under project COEMS (www.coems.eu, grant agreement no. 732016), project HyVar (www.hyvar-project.eu, grant agreement no. 644298) and ICT COST Action IC1402 ARVI (www.cost-arvi.eu). We thank the anonymous VORTEX 2018 reviewers for insightful comments and suggestions for improving the presentation.


  • [1] V. Bos and S. Mauw (2002-06) A LaTeX macro package for message sequence charts—maintenance document—describing version . Note: Included in MSC macro package distribution
  • [2] V. Bos and S. Mauw (2002-06) A LaTeX macro package for message sequence charts—reference manual—describing version . Note: Included in MSC macro package distribution
  • [3] V. Bos and S. Mauw (2002-06) A LaTeX macro package for message sequence charts—user manual—describing version . Note: Included in MSC macro package distribution
  • [4] M. Goossens, S. Rahtz, and F. Mittelbach (1997) The LaTeX Graphics Companion. Addison-Wesley.
  • [5] ITU-TS (1997) ITU-TS Recommendation Z.120: Message Sequence Chart (MSC). Geneva.
  • [6] L. Lamport (1994) LaTeX—a document preparation system—user’s guide and reference manual. 2nd edition, Adsison-Wesley. Note: Updated for LaTeX2e
  • [7] E. Rudolph, P.Graubmann, and J. Grabowski (1996) Tutorial on message sequence charts (MSC’96). In FORTE,