# Property Directed Self Composition

We address the problem of verifying k-safety properties: properties that refer to k-interacting executions of a program. A prominent way to verify k-safety properties is by self composition. In this approach, the problem of checking k-safety over the original program is reduced to checking an "ordinary" safety property over a program that executes k copies of the original program in some order. The way in which the copies are composed determines how complicated it is to verify the composed program. We view this composition as provided by a semantic self composition function that maps each state of the composed program to the copies that make a move. Since the "quality" of a self composition function is measured by the ability to verify the safety of the composed program, we formulate the problem of inferring a self composition function together with the inductive invariant needed to verify safety of the composed program, where both are restricted to a given language. We develop a property-directed inference algorithm that, given a set of predicates, infers composition-invariant pairs expressed by Boolean combinations of the given predicates, or determines that no such pair exists. We implemented our algorithm and demonstrate that it is able to find self compositions that are beyond reach of existing tools.

## Authors

• 2 publications
• 10 publications
• 22 publications
• 5 publications
• ### Self-composition to Prove Relational Properties in Annotated C Program

Deductive verification provides a powerful tool to show functional prope...
01/21/2018 ∙ by Lionel Blatter, et al. ∙ 0

• ### RHLE: Automatic Verification of ∀∃-Hyperproperties

Specifications of program behavior typically consider single executions ...
02/07/2020 ∙ by Robert Dickerson, et al. ∙ 0

• ### fc: A Package for Generalized Function Composition Using Standard Evaluation

In this article, we present a new R package fc that provides a streamlin...
06/28/2018 ∙ by Xiaofei Wang, et al. ∙ 0

• ### Failure-Directed Program Trimming (Extended Version)

This paper describes a new program simplification technique called progr...
06/14/2017 ∙ by Kostas Ferles, et al. ∙ 0

• ### "ReLIC: Reduced Logic Inference for Composition" for Quantifier Elimination based Compositional Reasoning and Verification

The paper presents our research on quantifier elimination (QE) for compo...
02/28/2021 ∙ by Hao Ren, et al. ∙ 0

• ### Static and Dynamic Verification of Relational Properties on Self-Composed C Code

Function contracts are a well-established way of formally specifying the...
01/21/2018 ∙ by Lionel Blatter, et al. ∙ 0

• ### Lost in Disclosure: On The Inference of Password Composition Policies

Large-scale password data breaches are becoming increasingly commonplace...
03/12/2020 ∙ by Saul Johnson, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Many relational properties, such as noninterference [12], determinism [21], service level agreements [9], and more, can be reduced to the problem of -safety. Namely, reasoning about different traces of a program simultaneously. A common approach to verifying -safety properties is by means of self composition, where the program is composed with copies of itself [4, 31]. A state of the composed program consists of the states of each copy, and a trace naturally corresponds to traces of the original program. Therefore, -safety properties of the original program become ordinary safety properties of the composition, hence reducing -safety verification to ordinary safety. This enables reasoning about -safety properties using any of the existing techniques for safety verification such as Hoare logic [20] or model checking [7].

While self composition is sound and complete for -safety, its applicability is questionable for two main reasons: (i) considering several copies of the program greatly increases the state space; and (ii) the way in which the different copies are composed when reducing the problem to safety verification affects the complexity of the resulting self composed program, and as such affects the complexity of verifying it. Improving the applicability of self composition has been the topic of many works [2, 29, 14, 18, 26, 32]. However, most efforts are focused on compositions that are pre-defined, or only depend on syntactic similarities.

In this paper, we take a different approach; we build upon the observation that by choosing the “right” composition, the verification can be greatly simplified by leveraging “simple” correlations between the executions. To that end, we propose an algorithm, called Pdsc, for inferring a property directed self composition. Our approach uses a dynamic composition, where the composition of the different copies can change during verification, directed at simplifying the verification of the composed program.

Compositions considered in previous work differ in the order in which the copies of the program execute: either synchronously, asynchronously, or in some mix of the two [33, 3, 14]. To allow general compositions, we define a composition function that maps every state of the composed program to the set of copies that are scheduled in the next step. This determines the order of execution for the different copies, and thus induces the self composed program. Unlike most previous works where the composition is pre-defined based on syntactic rules only, our composition is semantic as it is defined over the state of the composed program.

To capture the difficulty of verifying the composed program, we consider verification by means of inferring an inductive invariant, parameterized by a language for expressing the inductive invariant. Intuitively, the more expressive the language needs to be, the more difficult the verification task is. We then define the problem of inferring a composition function together with an inductive invariant for verifying the safety of the composed program, where both are restricted to a given language. Note that for a fixed language , an inductive invariant may exist for some composition function but not for another 111See Appendix 0.B for an example that requires a non-linear inductive invariant with a composition that is based on the control structure but has a linear invariant with another. . Thus, the restriction to defines a target for the inference algorithm, which is now directed at finding a composition that admits an inductive invariant in .

###### Example 1

To demonstrate our approach, consider the program in Figure 1. The program inserts a new value into an array. We assume that the array and its length are “low”-security variables, while the inserted value is “high”-security. The first loop finds the location in which will be inserted. Note that the number of iterations depends on the value of . Due to that, the second loop executes to ensure that the output (which corresponds to the number of iterations) does not leak sensitive data. As an example, we emphasize that without the second loop, could leak the location of in . To express the property that does not leak sensitive data, we use the 2-safety property that in any two executions, if the inputs and are the same, so is the output .

To verify the 2-safety property, consider two copies of the program. Let the language for verifying the self composition be defined by the predicates depicted in Figure 1. The most natural self composition to consider is a lock-step composition, where the copies execute synchronously. However, for such a composition the composed program may reach a state where, for example, . This occurs when the first copy exists the first loop, while the second copy is still executing it. Since the language cannot express this correlation between the two copies, no inductive invariant suffices to verify that when the program terminates.

In contrast, when verifying the 2-safety property, Pdsc directs its search towards a composition function for which an inductive invariant in does exist. As such, it infers the composition function depicted in Figure 1, as well as an inductive invariant in . The invariant for this composition implies that at every state.

As demonstrated by the example, Pdsc focuses on logical languages based on predicate abstraction [17], where inductive invariants can be inferred by model checking. In order to infer a composition function that admits an inductive invariant in , Pdsc starts from a default composition function, and modifies its definition based on the reasoning performed by the model checker during verification. As the composition function is part of the verified model (recall that it is defined over the program state), different compositions are part of the state space explored by the model checker. As a result, a key ingredient of Pdsc is identifying “bad” compositions that prevent it from finding an inductive invariant in . It is important to note that a naive algorithm that tries all possible composition functions has a time complexity , where is the set of predicates considered. However, integrating the search for a composition function into the model checking algorithm allows us to reduce the time complexity of the algorithm to , where we show that the problem is in fact PSPACE-hard.

We implemented Pdsc using SeaHorn [19], Z3 [25] and Spacer [22] and evaluated it on examples that demonstrate the need for nontrivial semantic compositions. Our results clearly show that Pdsc can solve complex examples by inferring the required composition, while other tools cannot verify these examples. We emphasize that for these particular examples, lock-step composition is not sufficient. We also evaluated Pdsc on the examples from [29, 26] that are proven with the trivial lock-step composition. On these examples, Pdsc is comparable to state of the art tools.

#### 1.0.1 Related work.

This paper addresses the problem of verifying k-safety properties (also called hyperproperties [8]) by means of self composition. Other approaches tackle the problem without self-composition, and often focus on more specific properties, most noticeably the -safety noninterference property (e.g. [1, 32]). Below we focus on works that use self-composition.

Previous work such as [4, 2, 3, 15, 31, 14] considered self composition (also called product programs) where the composition function is constant and set a-priori, using syntax-based hints. While useful in general, such self compositions may sometimes result in programs that are too complex to verify. This is in contrast to our approach, where the composition function is evolving during verification, and is adapted to the capabilities of the model checker.

The work most closely related to ours is [29] which introduces Cartesian Hoare Logic (CHL) for verification of -safety properties, and designs a verification framework for this logic. This work is further improved in [26]. These works search for a proof in CHL, and in doing so, implicitly modify the composition. Our work infers the composition explicitly and can use off-the-shelf model checking tools. More importantly, when loops are involved both [29] and [26] use lock-step composition and align loops syntactically. Our algorithm, in contrast, does not rely on syntactic similarities, and can handle loops that cannot be aligned trivially.

There have been several results in the context of harnessing Constraint Horn Clauses (CHC) solvers for verification of relational properties [11, 24]. Given several copies of a CHC system, a product CHC system that synchronizes the different copies is created by a syntactical analysis of the rules in the CHC system. These works restrict the synchronization points to CHC predicates (i.e., program locations), and consider only one synchronization (obtained via transformations of the system of CHCs). On the other hand, our algorithm iteratively searches for a good synchronization (composition), and considers synchronizations that depend on program state.

##### Equivalence checking and regression verification.

Equivalence checking is another closely related research field, where a composition of several programs is considered. As an example, equivalence checking is applied to verify the correctness of compiler optimizations [33, 28, 10, 18]. In [28] the composition is determined by a brute-force search for possible synchronization points. While this brute-force search resembles our approach for finding the correct composition, it is not guided by the verification process. The works in [10, 18] identify possible synchronization points syntactically, and try to match them during the construction of a simulation relation between programs.

Regression verification also requires the ability to show equivalence between different versions of a program [15, 16, 30]. The problem of synchronizing unbalanced loops appears in [30] in the form of unbalanced recursive function calls. To allow synchronization in such cases, the user can specify different unrolling parameters for the different copies. In contrast, our approach relies only on user supplied predicates that are needed to establish correctness, while synchronization is handled automatically.

## 2 Preliminaries

In this paper we reason about programs by means of the transition systems defining their semantics. A transition system is a tuple , where is a set of states, is a transition relation that specifies the steps in an execution of the program, and is a set of terminal states such that every terminal state has an outgoing transition to itself and no additional transitions (terminal states allow us to reason about pre/post specifications of programs). An execution or trace is a (finite or infinite) sequence of states such that for every , . The execution is terminating if there exists such that . In this case, the suffix of the execution is of the form and we say that ends at .

As usual, we represent transition systems using logical formulas over a set of variables, corresponding to the program variables. We denote the set of variables by . The set of terminal states is represented by a formula over and the transition relation is represented by a formula over , where represents the pre-state of a transition and represents its post-state. In the sequel, we use sets of states and their symbolic representation via formulas interchangeably.

##### Safety and inductive invariants.

We consider safety properties defined via pre/post conditions.222Our results can be extended to arbitrary safety (and -safety) properties by introducing “observable” states to which the property may refer. A safety property is a pair where are formulas over , representing subsets of , denoting the pre- and post-condition, respectively. satisfies , denoted , if every terminating execution of that starts in a state such that ends in a state such that . In other words, for every state that is reachable in from a state in pre we have that .

A prominent way to verify safety properties is by finding an inductive invariant. An inductive invariant for a transition system and a safety property is a formula such that (1) (initiation), (2) (consecution), and (3) (safety), where denotes the validity of , and denotes , i.e., the formula obtained after substituting every by the corresponding . If there exists such an inductive invariant, then .

##### k-safety.

A -safety property refers to interacting executions of . Similarly to an ordinary property, it is defined by , except that pre and post are defined over where denotes the th copy of the program variables. As such, pre and post represent sets of -tuples of program states (-states for short): for a -tuple of states and a formula over , we say that if is satisfied when for each , the assignment of is determined by . We say that satisfies , denoted , if for every terminating executions of that start in states , respectively, such that , it holds that they end in states , respectively, such that .

For example, the non interference property may be specified by the following -safety property:

 pre=⋀v∈LowInv1=v2% post=⋀v∈LowOutv1=v2

where and

denote subsets of the program inputs, resp. outputs, that are considered “low security” and the rest are classified as “high security”. This property asserts that every

terminating executions that start in states that agree on the “low security” inputs end in states that agree on the low security outputs, i.e., the outcome does not depend on any “high security” input and, hence, does not leak secure information.

Checking -safety properties reduces to checking ordinary safety properties by creating a self composed program that consists of copies of the transition system, each with its own copy of the variables, that run in parallel in some way. Thus, the self composed program is defined over variables , where denotes the variables associated with the th copy. For example, a common composition is a lock-step composition in which the copies execute simultaneously. The resulting composed transition system is defined such that , and . Note that is defined over (as usual). Then, the -safety property is satisfied by if and only if an ordinary safety property is satisfied by . More general notions of self composition are investigated in Section 3.

## 3 Inferring Self Compositions for Restricted Languages of Inductive Invariants

Any self-composition is sufficient for reducing -safety to safety, e.g., lock-step, sequential, synchronous, asynchronous, etc. However, the choice of the self-composition used determines the difficulty of the resulting safety problem. Different self composed programs would require different inductive invariants, some of which cannot be expressed in a given logical language.

In this section, we formulate the problem of inferring a self composition function such that the obtained self composed program may be verified with a given language of inductive invariants. We are, therefore, interested in inferring both the self composition function and the inductive invariant for verifying the resulting self composed program. We start by formulating the kind of self compositions that we consider.

In the sequel, we fix a transition system with a set of variables .

### 3.1 Semantic Self Composition

Roughly speaking, a self composition of consists of copies of that execute together in some order, where steps may interleave or be performed simultaneously. The order is determined by a self composition function, which may also be viewed as a scheduler that is responsible for scheduling a subset of the copies in each step. We consider semantic compositions in which the order may depend on the states of the different copies, as well as the correlations between them (as opposed to syntactic compositions that only depend on the control locations of the copies, but may not depend on the values of other variables):

###### Definition 1 (Semantic Self Composition Function)

A semantic self composition function (-composition function for short) is a function , mapping each -state to a nonempty set of copies that are to participate in the next step of the self composed program333We consider memoryless composition functions. Compositions that depend on the history of the (joint) execution are supported via ghost state added to the program to track the history..

We represent a -composition function by a set of logical conditions, with a condition for every nonempty subset of the copies. For each such , the condition is defined over , and hence it represents a set of -states, with the meaning that all the -states that satisfy are mapped to by :

 f(s1,…,sk)=M  if and only if  (s1,…,sk)⊨CM.

To ensure that the function is well defined, we require that , which ensures that every -state satisfies at least one of the conditions. We also require that for every , , hence every -state satisfies at most one condition. Together these requirements ensure that the conditions induce a partition of the set of all -states. In the sequel, we identify a -composition function with its symbolic representation via conditions and use them interchangeably.

###### Definition 2 (Composed Program)

Given a -composition function , represented via conditions for every nonempty set , we define the self composition of to be the transition system over variables defined as follows: , where , and

 Rf=⋁∅≠M⊆{1..k}(CM∧φM) where φM=⋀j∈MR(Vj,Vj′)∧⋀j∉MVj=Vj′

Thus, in , the set of states consists of -states (), the terminal states are -states in which all the individual states are terminal, and the transition relation includes a transition from to if and only if and

 (∀i∈M. (si,s′i)∈R)∧(∀i∉M. si=s′i)

That is, every transition of corresponds to a simultaneous transition of a subset of the copies of , where the subset is determined by the self composition function . If , then for every we say that is scheduled in .

###### Example 2

A self composition that runs the copies of sequentially, one after the other, corresponds to a -composition function defined by where is the minimal index of a non-terminal state in . If all states in are terminal then (or any other index). This is encoded as follows: for every , , and for every other .

###### Example 3

The lock-step composition that runs the copies of synchronously corresponds to a -self composition function defined by , and encoded by and for every other .

In order to ensure soundness of a reduction of -safety to safety via self composition, one has to require that the self composition function does not “starve” any copy of the transition system that is about to terminate if it continues to execute. We refer to this requirement as fairness.

###### Definition 3 (Fairness)

A -self composition function is fair if for every terminating executions of there exists an execution of such that for every copy , the projection of to is .

Note that by the definition of the terminal states of , as above is guaranteed to be terminating. We say that the th copy terminates in if contains a -state such that . Fairness may be enforced in a straightforward way by requiring that whenever , the set includes no index for which , unless all have terminated. Since we assume that terminal states may only transition to themselves, a weaker requirement that suffices to ensure fairness is that includes at least one index for which , unless there is no such index.

The following claim is now straightforward:

###### Lemma 1

Let be a transition system, a -safety property, and a fair -composition function for and . Then

 T⊨k(pre,post)\ iff \ \ Tf⊨(pre,post).
###### Proof (sketch)

Every terminating execution of corresponds to terminating executions of . Fairness of ensures that the converse also holds.

To demonstrate the necessity of the fairness requirement, consider a (non-fair) self composition function that maps every state to . Then, regardless of what the actual transition system does, the resulting self composition satisfies every pre-post specification vacuously, as it never reaches a terminal state.

###### Remark 1

While we require the conditions defining a self composition function to induce a partition of in order to ensure that is well defined as a (total) function, the requirement may be relaxed in two ways. First, we may allow and to overlap. This will add more transitions and may make the task of verifying the composed program more difficult, but it maintains the soundness of the reduction. Second, it suffices that the conditions cover the set of reachable states of the composed program rather than the entire state space. These relaxations do not damage soundness. Technically, this means that represented by the conditions is a relation rather than a function. We still refer to it as a function and write to indicate that , not excluding the possibility that for as well. We note that as long as the language used to describe compositions is closed under Boolean operations, we can always extract from the conditions a function . This is done as follows:

• To prevent the overlap between conditions, determine an arbitrary total order on the sets and set .

• To ensure that the conditions cover the entire state space, set .

It is easy to verify that defined by is a total self composition function and that if is fair, then so is .

### 3.2 The Problem of Inferring Self Composition with Inductive Invariant

Lemma 1 states the soundness of the reduction of -safety to ordinary safety. Together with the ability to verify safety by means of an inductive invariant, this leads to a verification procedure. However, while soundness of the reduction holds for any self composition, an inductive invariant in a given language may exist for the composed program resulting from some compositions but not from others. We therefore consider the self composition function and the inductive invariant together, as a pair, leading to the following definition.

###### Definition 4

Let be a transition system and a safety property. For a formula over and a self composition function represented by conditions , we say that is a composition-invariant pair for and if the following conditions hold:

•   (initiation of ),

• for every , (consecution of for ),

•   (safety of ),

•    ( covers the reachable states),

• for every ,    ( is fair).

As commented in Remark 1, we relax the requirement that to , thus ensuring that the conditions cover all the reachable states. Since the reachable states of are determined by (which define ), this reveals the interplay between the self composition function and the inductive invariant. Furthermore, we do not require that for , hence a -state may satisfy multiple conditions. As explained earlier, these relaxations do not damage soundness. Furthermore, if we construct from a self composition function as described in Remark 1, would be an inductive invariant for as well.

###### Lemma 2

If there exists a composition-invariant pair for and , then .

###### Proof (sketch)

If is a composition-invariant pair, then is an inductive invariant for , where is a fair composition function defined as in Remark 1. From Lemma 1 we conclude that .

If we do not restrict the language in which and are specified, then the converse also holds. However, in the sequel we are interested in the ability to verify -safety with a given language, e.g., one for which the conditions of Definition 4 belong to a decidable fragment of logic and hence can be discharged automatically.

###### Definition 5 (Inference in L)

Let be a logical language. The problem of inferring a composition-invariant pair in is defined as follows. The input is a transition system and a -safety property . The output is a composition-invariant pair for and (as defined in Definition 4), where and is represented by conditions such that for every . If no such pair exists, the output is “no solution”.

When no solution exists, it does not necessarily mean that . Instead, it may be that the language is simply not expressive enough. Unfortunately, for expressive languages (e.g., quantified formulas or even quantifier free linear integer arithmetic), the problem of inferring an inductive invariant alone is already undecidable, making the problem of inferring a composition-invariant pair undecidable as well:

###### Lemma 3

Let be closed under Boolean operations and under substitution of a variable with a value, and include equalities of the form , where is a variable and is a value (of the same sort). If the problem of inferring an inductive invariant in is undecidable, then so is the problem of inferring a composition-invariant pair in .

###### Proof

We show a reduction from the ordinary invariant inference problem in to the problem of inferring a composition-invariant pair in . Given a transition system and an ordinary safety property the reduction constructs a transition system over , where is a new Boolean variable such that when the original transitions are taken and when the systems remains in the same state, which is also added to the set of terminal states. Formally, for every , let be an arbitrary fixed value in the domain of . For example, if is Boolean, . The reduction constructs

 R∗ =(b∧R∧b′)∨(¬b∧(⋀v∈Vv′=av)∧¬b′) F∗ =F∨(¬b∧⋀v∈Vv′=av),

and the following -safety property:

 pre∗ =(b1∧pre(V1)∧¬b2∧⋀v∈Vv2=av) post∗ =(b1∧post(V1)∧¬b2∧⋀v∈Vv2=av).

That is, the first copy is “initialized” with and with the original pre-condition and is required to terminate in a state that satisfies the original post-condition, while the second copy is initialized with , and with the value for each original variable, and is required to terminate in the same state. Clearly, if has an inductive invariant for , then is a composition-invariant pair for and , where is defined by and for any other , which is clearly in . For the converse direction, if has a composition-invariant pair for then obtained by substituting each positive occurrence of in by false, each negative occurrence of by true and each occurrence of by is an inductive invariant for and . ∎

For example, linear integer arithmetic satisfies the conditions of the lemma. This motivates us to restrict the languages of inductive invariants. Specifically, we consider languages defined by a finite set of predicates. We consider relational predicates, defined over . For a finite set of predicates , we define to be the set of all formulas obtained by Boolean combinations of the predicates in .

###### Definition 6 (Inference using predicate abstraction)

The problem of inferring a predicate-based composition-invariant pair is defined as follows. The input is a transition system , a -safety property , and a finite set of predicates . The output is the solution to the problem of inferring a composition-invariant pair for and in .

###### Remark 2

It is possible to decouple the language used for expressing the self composition function from the language used to express the inductive invariant. Clearly, different sets of predicates (and hence languages) can be assigned to the self composition function and to the inductive invariant. However, since inductiveness is defined with respect to the transitions of the composed system, which are in turn defined by the self composition function, if the language defining is not included in the language defining , the conditions themselves would be over-approximated when checking the requirements of Definition 4 and therefore would incur a precision loss. For this reason, we use the same language for both.

Since the problem of invariant inference in is PSPACE-hard [23], a reduction from the problem of inferring inductive invariants to the problem of inferring composition-invariant pairs (similar to the one used in the proof of Lemma 3) shows that composition-invariant inference in is also PSPACE-hard:

###### Theorem 3.1

Inferring a predicate-based composition-invariant pair is PSPACE-hard.

## 4 Algorithm for Inferring Composition-Invariant Pairs

In this section, we present Property Directed Self-Composition, Pdsc for short — our algorithm for tackling the composition-invariant inference problem for languages of predicates (Definition 6). Namely, given a transition system , a -safety property and a finite set of predicates , we address the problem of finding a pair ), where is a self composition function and is an inductive invariant for the composed transition system obtained from , and both of them are in , i.e., defined by Boolean combinations of the predicates in .

We rely on the property that a transition system (in our case ) has an inductive invariant in if and only if its abstraction obtained using is safe. This is because, the set of reachable abstract states is the strongest set expressible in that satisfies initiation and consecution. Given , this allows us to use predicate abstraction to either obtain an inductive invariant in for (if the abstraction of is safe) or determine that no such inductive invariant exists (if an abstract counterexample trace is obtained). The latter indicates that a different self composition function needs to be considered. A naive realization of this idea gives rise to an iterative algorithm that starts from an arbitrary initial composition function and in each iteration computes a new composition function. At the worst case such an algorithm enumerates all self composition functions defined in , i.e., has time complexity . Importantly, we observe that, when no inductive invariant exists for some composition function, we can use the abstract counterexample trace returned in this case to (i) generalize and eliminate multiple composition functions, and (ii) identify that some abstract states must be unreachable if there is to be a composition-invariant pair, i.e., we “block” states in the spirit of property directed reachability [5, 13]. This leads to the algorithm depicted in Algorithm 1 whose worst case time complexity is . Next, we explain the algorithm in detail.

#### Finding an inductive invariant for a given composition function using predicate abstraction.

We use predicate abstraction [17, 27] to check if a given candidate composition function has a corresponding inductive invariant. This is done as follows. The abstraction of using , denoted , is a transition system defined over variables , where (we omit the terminal states). , i.e., each abstract state corresponds to a valuation of the Boolean variables representing . An abstract state represents the following set of states of :

 γ(^s)={s∥∈S∥k∣∀p∈P. s∥⊨p⇔^s(bp)=1}

We extend to sets of states and to formulas representing sets of states in the usual way. The abstract transition relation is defined as usual:

 ^R={(^s1,^s2)∣∃s∥1∈γ(^s1) ∃s∥2∈γ(^s2). (s∥1,s∥2)∈Rf}

Note that the set of abstract states in does not depend on .

###### Notation

We sometimes refer to an abstract state as the formula . For a formula , we denote by the result of substituting each in by the corresponding Boolean variable . For the opposite direction, given a formula over , we denote by the formula in resulting from substituting each in by . Therefore, is a symbolic representation of .

Every set defined by a formula is precisely represented by in the sense that is equal to the set of states defined by , i.e., is a precise abstraction of . For simplicity, we assume that the termination conditions as well as the pre/post specification can be expressed precisely using the abstraction, in the following sense:

###### Definition 7

is adequate for and if there exist such that , and (for every copy ).

The following lemma provides the foundation for our algorithm:

###### Lemma 4

Let be a transition system, a safety property, and a finite set of predicates adequate for and . For a self composition function defined via conditions in , there exists an inductive invariant in such that is a composition-invariant pair for and if and only if the following three conditions hold:

• All reachable states of from satisfy ,

• All reachable states of from satisfy , and

• For every , .

Furthermore, if the conditions hold, then the symbolic representation of the set of abstract states of reachable from is a formula over such that is a composition-invariant pair for and .

###### Proof

The proof relies on the following statement, denoted by : for a formula in and an abstract state , for every it holds that (which follows by induction on the structure of a formula in , relying on the definition of ). In particular, this implies that for a formula over , it holds that whenever .

() Let , and be as described, and let () be a composition-invariant pair for and in . We first show that every (abstract) state that is reachable from in satisfies . Let be such a reachable state. Then there exists an abstract trace such that , and for every . Consider a concrete state of such that , then and from we get . From the definition of a composition-invariant pair (Definition 4) we get that (initiation). Since is in we get from that also . For , the next state in the abstract trace, it also holds that : since , we know that there exist some and such that , using we get that , the consecution of implies and from we get . By induction over the length of the abstract trace we get that . We now turn to show that conditions S1–S3 hold. First, the safety of for together with adequacy of and imply that , and since all the reachable states of satisfy , S1 follows. Similarly, the covering requirement of together with the property that is in for every and together with imply S2. Finally, S3 is implied directly from the fairness of (Definition 4).

() Assume that for , , and some composition function as described, conditions S1–S3 hold. Condition S1 ensures that satisfies the safety property , when we augment with a set of terminal states given by the formula . Hence, there exists an inductive invariant over for and . Furthermore, condition S2 ensures that there exists such for which (for example, such may be obtained by conjoining the inductive invariant ensured by S1 with another inductive invariant that establishes S2). To conclude the proof we show that () is a composition-invariant pair for and , as defined in Definition 4. First, initiation and safety of with respect to and , imply initiation and safety (respectively) of with respect to and due to and adequacy of . As for consecution of : for a pair of states in such that , if and , then . Therefore, if then (according to ), and from consecution of in also , and from we get and conclude the consecution of in . Similarly, for covering of : recall that , hence by , , i.e., covers the states satisfying . Finally, the fairness of follows from S3. ∎

Algorithm 1 starts from the lock-step self composition function (Algorithm 1), which is fair444Any fair self composition can be chosen as the initial one; we chose lock-step since it is a good starting point in many applications., and constructs the next candidate such that condition S3 in Lemma 4 always holds (see discussion of Modify_SC). Thus, condition S3 need not be checked explicitly.

Algorithm 1 checks whether conditions S1 and S2 hold for a given candidate composition function by calling Abs_Reach (Algorithm 1) – both checks are performed via a (non-)reachability check in , checking whether a state violating or is reachable from . Algorithm 1 maintains the abstract states that are not in by the formula Unreach defined over , which is initialized to false (as the lock-step composition function is defined for every state) and is updated in each iteration of Algorithm 1 to include the abstract states violating . If no abstract state violating S1 or S2 is reachable, i.e., the conditions hold, then Abs_Reach returns the (potentially overapproximated) set of reachable abstract states, represented by a formula over . In this case, by Lemma 4, is a composition-invariant pair (Algorithm 1). Otherwise, an abstract counterexample trace is obtained. (We can of course apply bounded model checking to check if the counterexample is real; we omit this check as our focus is on the case where the system is safe.)

###### Remark 3

In practice, we do not construct explicitly. Instead, we use the implicit predicate abstraction approach [6].

#### Eliminating self composition candidates based on abstract counterexamples.

An abstract counterexample to conditions S1 or S2 indicates that the candidate composition function has no corresponding . Violation of S1 can only be resolved by changing such that the abstract trace is no longer feasible. Violation of S2 may, in principle, also be resolved by extending the definition of such that it is defined for all the abstract states in the counterexample trace.

However, to prevent the need to explore both options, our algorithm maintains the following invariant for every candidate self composition function that it constructs:

###### Claim

Every abstract state that is not in is not reachable w.r.t. the abstract composed program of any composition function that is part of a composition-invariant pair for and .

This property clearly holds for the lock-step composition function, which the algorithm starts with, since for this composition, . As we explain in Corollary 2, it continues to hold throughout the algorithm.

As a result of this property, whenever a candidate composition function does not satisfy condition S1 or S2, it is never the case that needs to be extended to allow the abstract states in to be reachable. Instead, the abstract counterexample obtained in violation of the conditions needs to be eliminated by modifying .

Let be an abstract counterexample of such that and (violating S1) or (violating S2). Any self composition that agrees with on the states in for every