Static Analysis of Communicating Processes using Symbolic Transducers

We present a general model allowing static analysis based on abstract interpretation for systems of communicating processes. Our technique, inspired by Regular Model Checking, represents set of program states as lattice automata and programs semantics as symbolic transducers. This model can express dynamic creation/destruction of processes and communications. Using the abstract interpretation framework, we are able to provide a sound over-approximation of the reachability set of the system thus allowing us to prove safety properties. We implemented this method in a prototype that targets the MPI library for C programs.

READ FULL TEXT VIEW PDF

Authors

page 1

page 2

page 3

page 4

08/17/2021

B Maude: A formal executable environment for Abstract Machine Notation Descriptions

We propose B Maude, a prototype executable environment for the Abstract ...
12/28/2017

Abstract Interpretation using a Language of Symbolic Approximation

The traditional abstract domain framework for imperative programs suffer...
09/17/2018

FormuLog: Datalog for static analysis involving logical formulae

Datalog has become a popular language for writing static analyses. Becau...
10/16/2019

Abstract Transducers

Several abstract machines that operate on symbolic input alphabets have ...
07/06/2021

Thread-modular Analysis of Release-Acquire Concurrency

We present a thread-modular abstract interpretation(TMAI) technique to v...
04/02/2021

Demanded Abstract Interpretation (Extended Version)

We consider the problem of making expressive static analyzers interactiv...
07/05/2017

Combining Forward and Backward Abstract Interpretation of Horn Clauses

Alternation of forward and backward analyses is a standard technique in ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The static analysis of concurrent programs faces several well-known issues, including how to handle dynamical process creation. This last one is particularly challenging considering that the state space of the concurrent system may not be known nor bounded statically, which depends on the number and the type of variables of the program.

In order to overcome this issue, we combine a symbolic representation based on regular languages (like the one used in Regular Model Checking [1]) with a fixed-point analysis based on abstract interpretation [5]. We define the abstract semantics of a concurrent program by using of a symbolic finite-state transducer [15]. A (classical) finite-state transducer T encodes a set of rules to rewrite words over a finite alphabet. In a concurrent program, if each process only has a finite number of states, then we can represent a set of states of the concurrent program by a language and the transition function by a transducer. However, this assumption does not hold since we consider processes with infinite state space, so we have to represent a set of states of the concurrent program by a lattice automaton [9] and its transition function by a lattice transducer, a new kind of symbolic transducers that we define in this paper. Lattice Automata are able to recognize languages over an infinite alphabet. This infinite alphabet is an abstract domain (intervals, convex polyhedra, etc.) that abstracts process states.

We show, on Fig. 1 (detailed in Sec. 2), the kind of programs our method is able to analyse. This program generates an unbounded sequence of processes We want to prove safety properties such as: holds for every process when it reached its final location . The negation of this property is encoded as a lattice automaton  (Fig. 2) that recognizes the language of all bad configurations. Our verification algorithm is to compute an over-approximation of the reachability set , also represented by a lattice automaton, then, by testing the emptiness of the intersection of the languages, we are able to prove this property : .

1    if (id==0) 2      x := 1 3    else 4      receive(any_id,x); 5 6    create(next); 7    x := x+4; 8    send(next,x) Figure 1: Program example

Figure 2: Bad configurations

Related works. There are many works aiming at the static analysis of concurrent programs. Some of them use the abstract interpretation theory, but either they do not allow dynamic process creation [13] and/or use a different memory model [8] or do not consider numerical properties [7]. In [15], the authors defined symbolic transducers but they did not consider to raise it to the verification of concurrent programs. In [2], there is the same kind of representation that considers infinite state system but can only model finite-state processes. The authors of [4] present a modular static analysis framework targeting POSIX threads. Their model allows dynamic thread creation but lack communications between threads. More practically,  [16] is a formal verification tool using a dynamical analysis based on model checking aiming at the detection of deadlocks in Message Passing Interface [14] (MPI) programs but this analysis is not sound and also does not compute the value of the variables.

Contributions. In this article, we define an expressive concurrency language with communication primitives and dynamic process creation. We introduce its concrete semantics in terms of symbolic rewriting rules. Then, we give a way to abstract multi-process program states as a lattice automaton and also abstract our semantics into a new kind of symbolic transducer and specific rules. We also give application algorithms to define a global transition function and prove their soundness. A fixpoint computation is given to obtain the reachability set. Finally, in order to validate the approach, we implemented a prototype as a Frama-C [11] plug-in which targets a subset of MPI using the abstract domain library: Apron [10].

Outline. In Sec. 2, we present the concurrent language and its semantics definition, encoded by rewriting rules and a symbolic transducer. Then, Sec. 3 presents the abstract semantics and the algorithms used to compute the over-approximation of the reachability set of a program. In Sec. 4, we detail the implementation of our prototype targeting a subset of MPI which is mapped by the given semantics and run it on some examples (Sec. 5). We discuss about the potential and the future works of our method in Sec. 6.

2 Programming language and its Concrete Semantics

We present a small imperative language augmented with communications primitives such as unicast and multicast communications, and dynamical process creation. These primitives are the core of many parallel programming languages and libraries, such as MPI.

2.1 Language definition

In our model, memory is distributed: each process executes the same code, with its own set of variables. For the sake of clarity, all variables and expressions have the same type (integer), and we omit the declaration of the variables. Process identifiers are also integers.

<program> ::= <instrs>

<instrs> ::= <instr> ’;’ <instrs>

<id> ::= <expr> any_id

<instr> ::= ’{’ <instrs> ’}’ <ident> ’:=’ <expr> if ’(’<expr>’)’ <instr> [else <instr>] while ’(’<expr>’)’ <instr> create ’(’ <ident> ’)’ send ’(’ <id> ’,’ <ident> ’)’ receive ’(’ <id> ’,’ <ident> ’)’ broadcast ’(’ <expr> ’,’ <ident> ’)’

<ident> and <expr> stand for classical identifiers and arithmetic expressions on integers (as defined in the C language)

Communications are synchronous: a process with id=orig cannot execute the instruction send(dest, var) unless a process with id=dest is ready to execute the instruction receive(orig, var’); both processes then execute their instruction and the value of var (of process orig) is copied to variable var’ (of process dest). We also allow unconditional receptions with all_id meaning that a process with id=orig can receive a variable whenever another process is ready to execute an instruction send(orig,v). broadcast(orig, var) instructions cannot be executed unless all processes reach the same instruction. create(var) dynamically creates a new process that starts its execution at the program entry point. The id of the new process, which is a fresh id, is stored in var, so the current process can communicate with the newly created process. Other instructions are asynchronous. Affectations, conditions and loops keep the same meaning as in the language.

2.2 Formal Semantics

We model our program using an unbounded set of processes, ordered by their identifiers ranging from to . As usual, the control flow graph (CFG) of the program is a graph where vertices belong to a set of program points and edges are labelled by a where are the instructions defined in our language. Finally, represents the set of variables. Their domain of values is . For any expression of our language, and any valuation , we note its value.

Our processes share the same code and have distributed memory: each variable has a local usage in each process. Thus, a local state is defined as . It records the identifier of the process, its current location and the value of each local variable.

A global state is defined as a word of process local states: where is the number of running processes and is the free monoid on .

The semantics is given as a transition system , where is the set of all possible initial program states. As the code is shared, every process starts at the same location and every variable’s value is initialised with 0. Therefore, if there are initially processes, where . The transition relation is defined as:

  • for each local instruction (e.g. assignments, conditionals, and loops) , we have:

    is the classical small-step semantics of action

  • for every pair of send/receive instructions of two processes :
    and (or ), we have:

    when ( or ) and ( or )

  • for each broadcast instruction

  • finally, for each create instruction

In the following, we directly consider sets and defined as:

is the reflexive and transitive closure of . Given an initial set of states , the reachability set contains all states that can be found during an execution of the program. Assuming we want to check whether the program satisfies a safety property (expressed as a bad configuration) given by a set of states that must be avoided, the verification algorithm is simply to test whether ; if true, the program is safe.

Therefore, we would like to define in a more operational way, as a set of rewriting rules that can be applied to , so we can apply those rules iteratively until we reach the fixpoint .

2.3 Symbolic Rewriting Rules

Let us consider a local instruction ; for any set of states :

The effects of on is to rewrite every word of . Thus, we would like to express it as a rewriting rule where is a symbolic guard matching a set of words and a symbolic rewriting function. Since our method uses the framework of abstract interpretation (see Sec. 3), symbolic means that we consider elements of some lattice to define the rules. We give the rewriting rule that encodes the execution of a local instruction :

The guard matches words composed of any number of processes, then one process with location , then again any number of processes. The function means that the processes matched by will be rewritten as the identity and therefore not modified. is the lattice of sets of local states. rewrites a set of local states according to the semantics of . So every word that matches the guard will be rewritten and we will obtain .

We now give the general definition of those rewriting rules and how to apply them. We remind that the partial order can be extended to as if both words have the same length () and . Note that we do not allow in words: any word that would contain one or more letters is identified to the smallest element . Therefore, any word represents a set of words of : when .

Definition 1

Let be a lattice. A rewriting rule over is given by two sequences and such that:

  • and ;

  • ;
    We note

  • , ;

  • , .

With this rule, a finite word is rewritten to if:

  • can be written as a concatenation with:

    • and ,

    • ;

  • with:

    • and ,

    • .

For any , is defined as . Moreover, we denote by the element of defined as , (the symbol ’’ matches anything). With these notations, we can express the transition relation by a set of rewriting rules:

  • For every pair of send/receive instructions
    and , we have the rule:

    and symmetrically when is located before in the word of local states. When e.g. , the condition is satisfied for any .

  • for each broadcast instruction , we have the rule:

    The guard stands for the set

  • finally, for each create instruction , we have the rule:

    where fresh_id returns a new unique identifier where with the word of processes.

Example 1

Let us consider consider our running example depicted on Fig. 1. Let us assume we have a set of program states , i.e. there is either one process in , or three process in or two processes in . We consider the symbolic rewriting rule that results from the communication instructions. Its guard is and its rewriting functions with

then , which is the image of the state with three active processes. There is no possible communication when . Even if the locations match the guard, the first process can only send messages to a process with .

2.3.1 Transducers

Alternatively, the semantics of local instructions can also be described by a lattice transducer. A finite-state transducer is a finite-state automaton but instead of only accepting a language, it also rewrites it. A lattice transducer is similar to a finite-state transducer; however, it is symbolic, i.e. it accepts inputs (and produces outputs) belonging to the lattice , which may be an infinite set.

Definition 2

A Lattice Transducer is a tuple where:

  • is a lattice

  • is a finite set of states

  • are the initial states set

  • are the final states set

  • with is a finite set of transitions with guards and rewriting functions

Let and with and . We write when:

For any word , is the set of words such that there exists a sequence with , , and . For any language , .

We can express the semantics of the local instructions by a transducer as shown in Fig. 3.

Figure 3: Local transitions Transducer

Figure 4: “Neighbour” communication

For the language we presented, the transducer representation is not fully exploited. Indeed, only single self-looping transitions are present. Yet, in our example program, we notice that communications and dynamic creation are done in their “neighbourhood”: processes send their to their right neighbor, receive from the left and create processes on their right-side. This semantics can be expressed with our transducer representation. We give on Fig. 4 a transducer encoding a “neighbour” version of synchronous communications as send_right and receive_left primitives. In our illustration, we use the locations and in order to represent pre and post locations of send_right and receive_left instructions. However, this restriction is not satisfying: we wish to handle point-to-point communications regardless of process locations in words of states. Thus we have to limit the transducer to encode only local transitions.

Therefore, communications are encoded by semantics rules , and local instructions by a transducer . We note the transducer extended with semantic rules, i.e. for any language , . For any initial set of states , we have the reachability set . However, cannot be computed in general, so we need abstractions.

3 Abstract Semantics

3.1 Lattice Automata

We give here a look at the lattice automata. The reader may refer to [9] for further details. As said before, the definition of lattice automata requires to be atomistic, i.e.:

  • is the set of atoms; is an atom if

  • is atomic, i.e. :

  • any element is equal to to least upper bound of atoms smaller than itself:

The language recognized by lattice automata are on the set of atoms rather than on itself. The reason for this is that there may be different edges between the two same nodes. For example, let us consider the lattice of intervals, and let us consider the three automata depicted on Fig. 5. Intuitively, they represent the same set, but if we define their language as: , while . If we define the language on atoms, both automata recognize the language: (assuming we only consider integer bounds). We can also merge transitions and have automaton that recognizes the same language.

(a)

(b)

(c)
Figure 5: Three equivalent lattice automata

Thus the definition of the language allow us to split or merge transitions as long as the language remain the same. But if the interval may be split in an infinite number of smaller intervals, how can we ensure that there is only a finite number of transitions ? We introduce an arbitrary, finite partition of the atoms. may be defined as a function , where is an arbitrary finite set, such that if , and .

We define Partitioned Lattice Automata (PLAs) as the automata such that for any transition (i.e. all the atoms smaller than belong to the same partition class). A PLA is merged if , i.e. there is at most one transition per element of the partition. So merged PLAs have a finite number of transitions. Moreover, we can use this partition to design algorithms similar to the ones for Finite State Automata (such as union, intersection, determinisation and minimisation), with playing the role of a finite alphabet. Indeed, if is a merged PLA, we can apply to every label of the transitions and obtain a finite-state automata called . Normalised PLAs are merged PLAs that are also deterministic and minimised.

If we have , a widening operator on finite-state automata, and a widening operator on then we have a widening operator on lattice automata :

  • if and are isomorphic, then we apply on pairs of isomorphic transitions

  • otherwise we compute and then merge transitions accordingly.

Therefore, lattice automata are a convienient way to “lift” a numerical domain to an abstract domain for languages over , and to extend static analysis of sequential programs to concurrent programs. They can also easily handle disjunctive local invariants: is simply represented by two transitions and . Moreover, the whole reachability set is represented by a single automaton, which is both a blessing and a curse: it provides a concise, graphical way to represent the rechability set, but it also means that when computing a fixpoint by iteration (e.g. computing ), we compute an increasing sequence of (increasingly large) automata . When applying to , we have should avoid to recompute (either using cache or having a way to apply only to the ‘increment’).

3.2 Lattice Automata as an abstract domain

Since may be an infinite set, we must have a way to abstract languages (i.e. subsets of ) over an infinite alphabet. Lattice Automata [9] provide this kind of abstractions. Lattice Automata are similar to finite-state automata, but their transitions are labeled by elements of a lattice. In our case, lattice automata are appropriate because:

  • they provide a finite representation of languages over an infinite alphabet;

  • we can apply symbolic rewriting rules or a transducer to a lattice automaton (see Sec. 3.3);

  • there is a widening operator that ensures the termination of the analysis (see Sec. 3.4).

Definition 3

A lattice automaton is defined by a tuple where:

  • is an atomistic lattice111See [9] or Appendix 3.1., the order of which is denoted by ;

  • is a finite set of states;

  • and are the sets of initial and final states;

  • is a finite transition relation.222No transition is labeled by .

This definition requires to have a set of atoms . Abstract lattices like Intervals [5], Octagons [12] and Convex Polyhedra [6] are atomistic, so we can easily find such lattices to do our static analysis. Note that if is atomistic, and are also atomistic, their atoms belonging to respectively and . Moreover, for any set , the lattice is atomistic and its atoms are the singletons. In the remainder of this paper, we will assume that any lattice we consider is atomistic. Finally, in addition to a widening operator, lattice automata have classic FSA operations (, , , etc.).

The language recognized by a lattice automaton is noted and is defined by finite words on the alphabet . if and there is a sequence of states and transitions with and .

The reason why we define the language recognized by a lattice automaton as sequence of atoms are discussed in [9]; in a nutshell, this definition implies that two lattice automata that have the same concretisation recognize the same language. Moreover, by introducing a finite partition of the atoms, we can define determinisation and minimisation algorithms similar to the ones for finite-state automata, as well as a canonical form (normalized lattice automata).

Abstractions and Concretisations

Assuming there is a Galois connection between and we can extend the concretisation function , we can extend it to ; if , and for any language , . Thus, the concretisation of a lattice automaton is , which can be computed by applying to all of . Lattice automata are not a complete lattice; the abstraction function is defined as: if is regular (i.e. it can be represented by a lattice automaton with labels in ) we can apply to each edge; otherwise . The latter case does not happen in practice, since the initial set of states is regular, and since we only check regular properties. We now present algorithms to apply a symbolic rewriting rule or a lattice transducer to a lattice automaton.

3.3 Algorithms

3.3.1 Application of a Rule

To apply a symbolic rewriting rule to the language recognized by a lattice automaton, we must first identify the subset of words that match the guard . In this guard, it’s easier to look first for sequences in the automaton that match . In automaton a sequence that matches e.g. begins from state and ends in state . Then, we identify the sub-automaton that could match , i.e. all the states that are reachable from an initial state and correachable from by considering only transitions labeled by elements such that . Once each part is identified, we can apply the rewriting function to each part and then we get a new automaton

. Since this pattern matching is non deterministic, we have to consider all possible matching sequences. The result of the algorithm is the union of every automaton

constructed in this way.

We introduce some notations before writing the algorithm. Let and let be a lattice automaton. We denote by the set of matching sequences:

Let be a pair of states of a lattice automaton and let . We denote by the sub-automaton . For a lattice automaton and a function , we denote by the automaton where .

With those notations, we give an algorithm to apply a rewriting rule on a lattice automaton:

ApplyRule( and ):
Result 
For all matching sequences
, , ,
for each initial state  and for each final state 
 Let ,