CARET analysis of multithreaded programs

09/20/2017 ∙ by Huu-Vu Nguyen, et al. ∙ 0

Dynamic Pushdown Networks (DPNs) are a natural model for multithreaded programs with (recursive) procedure calls and thread creation. On the other hand, CARET is a temporal logic that allows to write linear temporal formulas while taking into account the matching between calls and returns. We consider in this paper the model-checking problem of DPNs against CARET formulas. We show that this problem can be effectively solved by a reduction to the emptiness problem of Büchi Dynamic Pushdown Systems. We then show that CARET model checking is also decidable for DPNs communicating with locks. Our results can, in particular, be used for the detection of concurrent malware.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Pushdown Systems (PDSs) are known to be a natural model for sequential programs [18]. Therefore, networks of pushdown systems are a natural model for concurrent programs where each PDS represents a sequential component of the system. In this context, Dynamic pushdown Networks (DPNs) [6] were introduced by Bouajjani et al. as a natural model of multithreaded programs with procedure calls and thread creation. Intuitively, a DPN is a network of pushdown processes where each process, represented by a Pushdown system (PDS), can perform basic pushdown actions, call procedures, as well as spawn new instances of pushdown processes. A lot of previous researches focused on investigating automated methods to verify DPNs. In [6, 15, 14, 9], the reachability analysis of DPNs are considered. While the model-checking problem for DPNs against double-indexed properties is undecidable, i.e., the properties where the satisfiability of an atomic proposition depends on control states of two or more threads [10], it is decidable to model-check DPNs against the linear temporal logic (LTL) and the computation tree logic (CTL) with single-indexed properties [19], i.e., properties where the satisfiability of an atomic proposition depends on control states of only one thread.

CARET is a temporal logic of calls and returns [1]. This logic allows us to write linear temporal formulas while taking into account the matching between calls and returns. CARET is needed to describe several important properties such as malicious behaviors or API usage rules. Thus, to be able to analyse such properties for multithreaded programs, we need to be able to check CARET formulas for DPNs. We tackle this problem in this paper. As LTL is a subclass of CARET, CARET model-checking for DPNs with double-indexed properties is also undecidable. Thus, in this paper, we consider the model-checking problem for DPNs against single-indexed CARET formulas and show that it is decidable. A single-indexed CARET formula is a formula in the form where is a CARET formula over a certain PDS . A DPN satisfies iff all instances of the PDS created in the network satisfy the subformula .

The model-checking problem of DPNs against single-indexed CARET formulas is non-trivial because the number of instances of pushdown processes in DPNs can be unbounded. It is not sufficient to check if every PDS satisfies the corresponding formula . Indeed, we need to ensure that all instances of created during a run of DPN satisfies the formula . Also, it is not correct to check whether all possible instances of satisfy the formula . Indeed, an instance of should not be checked if it is not created during the run of DPNs. In this paper, we solve these problems. We show that single-indexed CARET model checking is decidable for DPNs. To this end, we reduce the problem of checking whether Dynamic Pushdown Networks satisfy single-indexed CARET formulas to the membership problem for Büchi Dynamic Pushdown Networks (BDPNs). Finally, we show that single-indexed CARET model checking is decidable for Dynamic Pushdown Networks communicating via nested locks.

Related work.

[5, 7, 2, 3] considered Pushdown networks with communications between processes. However, these works consider only networks with a fixed number of threads. The model-checking problem for pushdown networks where synchronization between threads is ensured by a set of nested locks is considered in [12, 10, 11] for single-indexed LTL/CTL and double-indexed LTL. These works do not handle dynamic thread creation.

Multi-pushdown systems were considered in [13, 4] to represent multithreaded programs. These systems have only a finite number of stacks, and thus, they cannot handle dynamic thread creation.

Pushdown Networks with dynamic thread creation (DPNs) were introduced in [6]. The reachability problems of DPNs and its extensions are considered in [6, 9, 14, 15, 21]. [19] considers the model-checking problem of DPNs against single-indexed LTL and CTL, while [20] investigates the single-indexed LTL model checking problem for DPNs with locks.

[17, 16] consider CARET model checking for pushdown systems and its application to malware detection. These works can only handle sequential programs. In this paper, we go one step further and extend these works [17, 16] to DPNs and concurrent programs.

2 Linear Temporal Logic of Calls and Returns - CARET

In this section, we recall the definition of CARET [1]. A CARET formula is interpreted on an infinite path where each state on the path is associated with a tag in the set . A call-state denotes an invocation to a procedure of a program while the corresponding ret-state denotes the ret statement of that procedure. A simple statement (neither a call nor a ret statement) is called an internal statement and its associated state is called int-state.

Let be an infinite path where each state on the path is associated with a tag in the set . Over , three kinds of successors are defined for every position :

  • [noitemsep,topsep=0pt]

  • global-successor: The global-successor of is .

  • abstract-successor: The abstract-successor of is determined by its associated tag.

    • [noitemsep,topsep=0pt]

    • If is a call, the abstract successor of is the matching return point.

    • If is a int, the abstract successor of is .

    • If is a ret, the abstract successor of is defined as .

  • caller-successor: The caller-successor of is the most inner unmatched call if there is such a call. Otherwise, it is defined as .

A global-path is obtained by applying repeatedly the global-successor operator. Similarly, an abstract-path or a caller-path are obtained by repeatedly applying the abstract-successor and caller-successor respectively.

Formal Definition. Given a finite set of atomic propositions AP. Let . A CARET formula over AP is defined as follows (where ):

Let . Let be an -word over . Let be the suffix of starting from . Let , , be the global-successor, abstract-successor and caller-successor of respectively. The satisfiability relation is defined inductively as follows:

  • [noitemsep,topsep=0pt]

  • , where , iff and or

  • iff or

  • iff

  • iff

  • iff and

  • iff and

  • (with ) iff there exists a sequence of positions where , for every : and

Then, iff . Other CARET operators can be expressed by the above operators: , , ,…

Closure. Let be a CARET formula over . The closure of , denoted , is the smallest set that contains , , and and satisfies the following properties:

  • [noitemsep,topsep=0pt]

  • if , then

  • if (with ), then

  • if , then

  • if (with ), then

  • if , and is not in the form then

Atoms. A set is an atom of if it satisfies the following properties:

  • [noitemsep,topsep=0pt]

  • or

  • where or and

  • A includes exactly one element of the set {call, ret, int}

Let be the set of atoms of . Let and be two atoms, we define the following predicates:

  • [noitemsep,topsep=0pt]

  • iff for every iff .

  • iff for every iff

  • iff for every iff .

We define (resp. ) to be a function which returns the caller-formulas (resp. abstract-formulas) in . Formally:

  • [noitemsep,topsep=0pt]

3 Dynamic Pushdown Networks (DPNs)

3.1 Definitions

Dynamic Pushdown Networks (DPNs) is a natural model for multithreaded programs [6]. To be able to define CARET formulas over DPNs, we must extend this model to record whether a transition rule corresponds to a call, ret or a simple statement (neither call nor ret).

Definition 1.

A Dynamic Pushdown Network (DPN) is a set s.t. for every , is a Labelled Dynamic Pushdown System (DPDS), where is a finite set of control locations, for all , is a finite set of stack alphabet, and is a finite set of transition rules. Rules of are of the following form, where , :

  • [noitemsep,topsep=0pt]

Intuitively, there are two kinds of transition rules depending on the nature of . A rule with a suffix of the form is a nonspawn rule (does not spawn a new process), while a rule with a suffix describes a spawn rule (a new process is spawned). A nonspawn step describes pushdown operations of one single process in the network. Roughly speaking, a statement is described by a rule in the form . This rule usually models a statement of the form where is the control point of the program where the function call is made, is the entry point of the called procedure , and is the return point of the call; and can be used to encode various information, such as the return values of functions, shared data between procedures, etc. A return statement is modeled by a rule , while a rule is used to model a simple statement (neither a call nor a return). A spawn step allows in addition the creation of a new process. For instance, a rule of the form where describes that a process at control location and having on top of the stack can (1) change the control location to and modify the stack by replacing with and also (2) create a new instance of a process () starting at . Note that in this case, if is call, then is , and if t is ret, then is .

A DPDS can be seen as a Pushdown System (PDS) if there are no spawn rules in . Generally speaking, a DPN consists of a set of PDSs running in parallel where each PDS can dynamically spawn new instances of PDSs in the set during the run. An initial local configuration of a newly created instance is called a Dynamically Created Local Initial Configuration (DCLIC). For every , let be the set of DCLICs that can be created by the DPDS .

A local configuration of an instance of a DPDS is a tuple where is the control location, is the stack content. A global configuration of is a multiset over , in which is a local configuration of an instance of which is running in parallel in the network .

A DPDS defines a transition relation as follows: if then for every where if , if . Let be the transitive and reflexive closure of , then, for every :

  • [noitemsep,topsep=0pt]

  • if and , then,

A local run of an instance of a DPDS starting at a local configuration is a sequence s.t. for every , is a local configuration of , for some . A global run of from a global configuration is a set of local runs (possibly infinite) where each local run describes the execution of one instance of a certain DPDS . Initially, consists of local runs of instances starting from , when a new instance is created, a new local run of this instance is added to . For example, when a DCLIC is created by a certain local run of , a new local run that starts at is added to . Note that from a global configuration, we can obtain a set of global runs because from a local configuration, we can have different local runs.

3.2 Single-indexed CARET for DPNs

Given a DPN , a single-indexed CARET formula is a formula in the form s.t. for every , is a CARET formula in which the satisfiability of its atomic propositions depends only on the DPDS .

Given a set of atomic propositions , let be a labeling function that associates each control location with a set of atomic propositions.

Let be a local run of the DPDS . We associate to each local configuration of a tag in as follows, where or :

  • [noitemsep,topsep=0pt]

  • If corresponds to a transition rule , then .

Then, we say that satisfies iff the -word satisfies . A local configuration of satisfies (denoted ) iff there exists a local run starting from such that satisfies . If is the set of DCLICs created during the run , then, we write . A DPN satisfies a single-indexed CARET formula iff there exist a global run s.t. for every , each local run of in satisfies the formula .

4 Applications

We show in this section how model-checking single-indexed CARET for DPNs is necessary for concurrent malware detection.

Malware detection is nowadays a big challenge. Several malwares are multithreaded programs that involve recursive procedures and dynamic thread creation. Therefore, DPNs can be used to model such programs. We show in what follows how single-indexed CARET for DPNs can describe malicious behaviors of concurrent malwares.

More precisely, we show how this logic can specify email worms. To this aim, let us consider a typical email worm: the worm Bagle. Bagle is a multithreaded email worm. In the main thread, one of the first things the worm does is to register itself into the registry listing to be started at the boot time. Then, it does some different actions to hide itself from users. After this, the malware creates one thread (named Thread2) that listens on the port 6777 to receive different commands and also allow the attacker to upload a new file and execute it. This grants the attacker the ability to update new versions for his malware. In addition, the attacker can send a crafted byte sequence to this port to force the malware to kill itself and delete it from the system. Thus, the attacker can remove his malware remotely. In the next step, the malware creates one more thread (named Thread3) which contacts a list of websites every 10 minutes to announce the infection of the current machine. The malware sends the port it is listening to as well as the IP of the infected machine to these sites. At some point in the program, the malware continues to spawn a thread named Thread4 to search on local drives to look for valid email addresses. In this thread, for each email address found, the malware attaches itself and sends itself to this email address.

Thus, you can see that Bagle is a mutithreaded malware with dynamic thread creation, i.e., the main process can create threads to fulfill various tasks. To model Bagle, DPNs is a good candidate since DPNs allow dynamic thread creation. Let be a model of Bagle where is a PDS that represents the main process of the malware; are PDSs that model the code segments corresponding to Thread1, Thread2, Thread3 respectively. Note that are designed to execute specific tasks, while is a main process able to dynamically create an arbitrary number of instances of to fulfill tasks in need.

We show now how the malicious behavior of the different threads can be described by a CARET formula. Let us start with the main process. The typical behaviour of this process is to add its own executable name to the registry listing so that it can be started at the boot time. To do this, the malware needs to invoke the API function with and as parameters. will put the file name of its current executable on the memory address pointed by x. After that, the malware calls the API function with the same as parameter. will use the file name stored at to add itself into the registry key listing. This malicious behaviour can be specified by CARET as follows:

where the is taken over all possible memory addresses over domain .

Note that parameters are passed via the stack in binary programs. For succinctness, we use regular variable expression (resp. ) to describe the requirement that (resp. ) is on top of the stack. Then, this formula states that there is a call to the API GetModuleFileNameA with and on the top of the stack (i.e., with and as parameters), followed by a call to the API with on the top of the stack. Using the operator guarantees that RegSetValueExA is called after GetModuleFileNameA terminates.

Similarly, the malicious behaviors of the Threads 2, 3 and 4 can be described by CARET formulas , and respectively .

Thus, the malicious behavior of the concurrent worm Bagle can be described by the single-indexded CARET formula .

5 Single-indexed CARET model-checking for DPNs

In this section, we consider the CARET model-checking problem of DPNs. Let be a labeling function that associates each control location with a set of atomic propositions. Let be a DPN, be a single-indexed CARET formula.

5.1 Büchi DPNs (BDPNs)

Definition 2.

A Büchi DPDS (BDPDS) is a tuple s.t. is a DPDS, is the set of accepting control locations. A run of a BDPDS is accepted iff it visits infinitely often some control locations in .

Definition 3.

A Generalized Büchi DPDS (GBDPDS) is a tuple , where is a DPDS and is a set of sets of accepting control locations. A run of a GBDPDS is accepted iff it visits infinitely often some control locations in for every .

Given a BDPDS or a GBDPDS , let be a local configuration of . Then, let be the set of all pairs s.t. has an accepting run from and is the set of DCLICs generated during that run. We get the following properties:

Proposition 1.

Given a GBDPDS , we can effectively compute a BDPDS s.t. .

This result comes from the fact that we can translate a GBDPDS to a corresponding BDPDS by applying the similar approach as the translation from a Generalized Büchi automaton to a corresponding Büchi automaton [8].

Definition 4.

A Büchi Dynamic Pushdown Network (BDPN) is a set s.t. for every , is a BDPDS. A (global) run of a BDPN is accepted iff all local runs in are accepting (local) runs.

Definition 5.

A Generalized Büchi Dynamic Pushdown Network (GBDPN) is a set s.t. for every , is a GBDPDS. A (global) run of a GBDPN is accepted iff all local runs in are accepting (local) runs.

Given a BDPN or a GBDPN , let be the set of all global configurations s.t. has an accepting run from . We get the following properties:

Proposition 2.

Given a GBDPN , we can effectively compute a BDPN s.t. .

This result is obtained due to the fact that we can translate each GBDPDS in to a corresponding BDPDS in .

Given a BDPN where . Let be the index of the local configuration . Let . Then, we get the following theorem:

Theorem 5.1.

[19, 20] The membership problem of a BDPN is decidable in time .

Thus, from Proposition 2 and Theorem 5.1, we get that the membership problem of a GBDPN is decidable.

Theorem 5.2.

The membership problem of GBDPNs is decidable.

5.2 From CARET model checking of DPNs to the membership problem in BDPNs

Given a local run , let be the index of the DPDS corresponding to . Let be an initial global configuration of the DPN , then we say that satisfies iff has a global run starting from s.t. every local run in satisfies . Determining whether satisfies is a non-trivial problem since the number of global runs can be unbounded and the number of local runs of each global run can also be unbounded. Note that it is not sufficient to check whether every pushdown process satisfies the corresponding CARET formula . Indeed, we need to ensure that all instances of created during a global run satisfy the formula . Also, it is not correct to check whether all possible instances of satisfy the formula . Indeed, an instance of should not be checked if it is not created during a global run. To solve these problems, we reduce the CARET model-checking problem for DPNs to the membership problem for GBDPNs. To do this, we compute a GBDPN where () is a GBDPDS s.t. (1) the problem of checking whether each instance of satisfies a CARET formula can be reduced to the membership problem of ; (2) if creates a new instance of starting from , which requires that ; must also create an instance of starting from a certain configuration (computed from ) from which has an accepting run. In what follows, we present how to compute such GBDPDSs.

Let (we explain later the need to these labels). Given a DPDS (), a corresponding CARET formula , we define as the set of atoms A () such that and . Our goal is that for every (), we compute a GBDPDS s.t. for every , satisfies iff there exists an atom where s.t. has an accepting run from .

GBDPDSs Computation.

Let us fix a DPDS in the DPN , a CARET formula in corresponding to the DPDS . In this section, we show how to compute such a GBDPDS corresponding to . Given a local configuration , let be the index of the DPDS corresponding to . We define as follows:

  • [noitemsep,topsep=0pt]

  • and } is the finite set of control locations of

  • is the finite set of stack symbols of .

The transition relation of is the smallest set of transition rules satisfying the following:

  • [noitemsep,topsep=0pt]

  • for every : for every ; such that:

    • [noitemsep,topsep=0pt]

    • implies ( and )

    • if ; where if

  • for every :

    • [noitemsep,topsep=0pt]

    • for every such that:

      • [noitemsep,topsep=0pt]

      • if ; where if

    • for every ; such that:

      • [noitemsep,topsep=0pt]

  • for every : for every , such that:

    • [noitemsep,topsep=0pt]

    • if ; where if

Let and be the set of -formulas and -formulas of respectively. The generalized Büchi accepting condition of is defined as: where

  • [noitemsep,topsep=0pt]

  • where where if then for every .

  • where where if then for every .

Given a configuration , let be the procedure to which belongs. For example, in Figure 1, , …, . Intuitively, we compute as a kind of product of and which ensures that: for every , satisfies iff there exists an atom s.t. has an accepting run from . To do this, we encode atoms of into control locations of . The form of control locations of is where contains all sub formulas of which are satisfied at the configuration , is a label to determine whether the execution of the procedure of , , terminates in the path . A configuration labeled with means that the execution of is finished in , i.e., the run will run through the procedure , reaches its ret statement and exits after that. On the contrary, labeled with means that in , the execution of the procedure never terminates, i.e., the run will be stuck in and never exits the procedure . Let