On the Complexity of Pointer Arithmetic in Separation Logic (an extended version)

03/08/2018 ∙ by James Brotherston, et al. ∙ 0

We investigate the complexity consequences of adding pointer arithmetic to separation logic. Specifically, we study extensions of the points-to fragment of symbolic-heap separation logic with various forms of Presburger arithmetic constraints. Most significantly, we find that, even in the minimal case when we allow only conjunctions of simple "difference constraints" (x'≤ x+k) where k is an integer, polynomial-time decidability is already impossible: satisfiability becomes NP-complete, while quantifier-free entailment becomes coNP-complete and quantified entailment becomes P2-complete (P2 is the second class in the polynomial-time hierarchy) In fact we prove that the upper bound is the same, P2, even for the full pointer arithmetic but with a fixed pointer offset, where we allow any Boolean combinations of the elementary formulas (x'=x+k0), (x'≤ x+k0), and (x'<x+k0), and, in addition to the points-to formulas, we allow spatial formulas of the arrays the length of which is bounded by k0 and lists which length is bounded by k0, etc, where k0 is a fixed integer. However, if we allow a significantly more expressive form of pointer arithmetic - namely arbitrary Boolean combinations of elementary formulas over arbitrary pointer sums - then the complexity increase is relatively modest for satisfiability and quantifier-free entailment: they are still NP-complete and coNP-complete respectively, and the complexity appears to increase drastically for quantified entailments.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Separation logic ([23] is a well-known and popular Hoare-style framework for verifying the memory safety of heap-manipulating programs. Its power stems from the use of separating conjunction in its assertion language, where denotes a portion of memory that can be split into two disjoint fragments satisfying and respectively. Using separating conjunction, the frame rule becomes sound [27], capturing the fact that any valid Hoare triple can be extended with the same separate memory in its pre- and postconditions and remain valid, which empowers the framework to scale to large programs (see e.g. [26]). Indeed, separation logic now forms the basis for verification tools used in industrial practice, notably Facebook’s Infer [8] and Microsoft’s SLAyer [3].

Most separation logic analyses and tools restrict the form of assertions to a simple propositional structure known as symbolic heaps [2]. Symbolic heaps are (possibly existentially quantified) pairs of so-called “pure” and “spatial” assertions, where pure assertions mention only equalities and disequalities between variables and spatial formulas are -conjoined lists of pointer formulas and data structure formulas typically describing segments of linked lists () or sometimes binary trees. This fragment of the logic enjoys decidability in polynomial time [11] and is therefore highly suitable for use in large-scale analysers. However, in recent years, various authors have investigated the computational complexity of (and/or developed prototype analysers for) many other fragments employing various different assertion constructs, including user-defined inductive predicates [18, 5, 7, 1, 10], pointers with fractional permissions [22, 13], arrays [6, 19], separating implication ([9, 4], reachability predicates [14] and arithmetic [20, 21].

It is with this last feature, arithmetic, with which we are concerned in this paper. In general, assertions involving arithmetic arise naturally and for obvious reasons when analysing arithmetical programs; moreover, the use of pointer arithmetic, where pointers are treated explicitly as numerical addresses which can be manipulated arithmetically, is a standard feature e.g. of C code. We therefore set out by asking the following question: How much pointer arithmetic can one add to separation logic and remain within polynomial time?

Unfortunately, and perhaps surprisingly, the answer turns out to be: essentially none at all.

We study the complexity of symbolic-heap separation logic with pointers, but no other data structures, when pure formulas are extended by arithmetical constraints, in two variants. The first variant encapsulates a minimal language for pointer arithmetic, allowing only conjunctions of “difference constraints” (where is an integer), whereas the second is more expressive, allowing arbitrary Boolean combinations of elementary formulas over arbitrary pointer-and-offset sums.

We certainly do not claim that either fragment is appropriate for practical program verification; clearly, lacking constructs for lists or other data structures, they will be insufficiently expressive for most purposes (although they might be practical e.g. for some concurrent programs that deal only with shared memory buffers of a small fixed size). The point is that any practical fragment of separation logic employing arithmetic will almost inevitably include our minimal language and thus inherit its computational lower bounds.

Our complexity results for SL pointer arithmetic are summarised in Table 1. Perhaps our most striking result is that, even for the case of our minimal SL pointer arithmetic where only constant pointer offsets and conjunctions are permitted, the satisfiability problem is already -complete. On the other hand, the problem is still in when we extend to full pointer arithmetic. However, there is at least one material difference between the two fragments: minimal pointer arithmetic enjoys the small model property, meaning that any satisfiable symbolic heap has a model of size polynomial in the size of , whereas this property fails for full pointer arithmetic.

In the case of the entailment problem, the story is somewhat similar: for quantifier-free entailments the problem becomes -complete, irrespective of whether we consider minimal or full pointer arithmetic. However, the complexity appears to increase drastically for quantified entailments, where the problem is -complete for minimal pointer arithmetic but -complete for full pointer arithmetic. ( is the second class in the polynomial-time hierarchy [25] and  is the first class in the exponential-time hierarchy, which corresponds to Presburger arithmetic [17]).

minimal pointer arithmetic full pointer arithmetic
Satisfiability -complete -complete
Small model property yes no
Entailment, quantifier-free -complete -complete
Entailment, quantified -complete -complete.
Table 1: Summary of complexity results.

The remainder of this paper is structured as follows. In Section 2 we define symbolic-heap separation logic with pointer arithmetic, in both “minimal” and “full” flavours. Sections 3 and 4 study the satisfiability and entailment problems, respectively, for our minimal and full versions of SL pointer arithmetic, establishing upper and lower complexity bounds for all cases. In Section 5 we establish the small model property and thereby the upper bound for the quantified entailments within minimal pointer arithmetic. Section 6 concludes.

2 Separation logic with pointer arithmetic

Here, we introduce our language of separation logic with pointer arithmetic, building on the well-known “symbolic heap” fragment over pointers [2].

Because we have to take into account the balance between the arithmetical part and the spatial part of the language, we consider two varieties of pointer arithmetic: a “minimal” fragment containing only the bare essentials, and a “full” fragment allowing greater expressivity. To show lower complexity bounds, we have to challenge the fact that Presburger arithmetic is already -hard by itself; thus, to reveal the true memory-related nature of the problem, we restrict the arithmetical part of the language by restricting the pure part of our language to something so simple that it can be processed in polynomial time.. This leads us to consider minimal pointer arithmetic, in which we allow only conjunctions of ‘difference constraints’ of the form , and where and are variables and is an integer (even negation is not permitted). On the other hand, for upper complexity bounds, it stands to reason that we should aim for as much expressivity as possible while remaining within a particular complexity class. Thus we also consider full pointer arithmetic, in which arbitrary Boolean combinations of elementary formulas over arbitrary pointer sums are permitted.

Definition 1 ( pointer arithmetic).

A symbolic heap is given by

(1)

where is a tuple of variables from an infinite set , and and are respectively pure and spatial formulas, defined below.

For full pointer arithmetic, we define terms , pure formulas , and spatial formulas  by the following grammar:

where ranges over .

For minimal pointer arithmetic, we instead define terms , pure formulas , and spatial formulas  by the following simpler grammar:

Whenever one of is empty in a symbolic heap , we omit the colon.

In the case of minimal pointer arithmetic, the pure part of a symbolic heap is a conjunction of ‘difference constraints’ of the form or , where and are variables, and is a fixed offset in . The satisfiability of such formulas can be decided in polynomial time; see [12]. The crucial observation is:

Proposition 1.

A ‘circular’ system of difference constraints , …, , allows one to conclude that , which is a contradiction iff the latter sum is negative.

Thus, considering our symbolic heaps in minimal pointer arithmetic readdresses the challenge of establishing relevant lower bounds to the spatial part of the language.

Semantics. As usual, we interpret symbolic heaps in a stack-and-heap model; for convenience we consider both locations to be natural numbers, and values to be either natural numbers or the non-addressable null value . Thus a stack is a function . We extend stacks over terms as usual: , and . If is a stack, and is a value, we write for the stack defined as except that . We extend stacks pointwise over term tuples.

A heap is a finite partial function mapping finitely many locations to values; we write for the domain of , and for the empty heap that is undefined on all locations. We write for composition of domain-disjoint heaps: if and are heaps, then is the union of and when and are disjoint, and undefined otherwise.

Definition 2.

The satisfaction relation , where is a stack, a heap and a symbolic heap, is defined by structural induction on .

3 Satisfiability

Here we establish upper and lower complexity for the satisfiability problem in both the minimal and full variants of our pointer arithmetic.

Definition 3.

Let be a symbolic heap of the form

We describe the heap models  of  by means of the following Presburger formula  obtained by enriching the pure part  with the constraints on that , the allocated addresses, must be distinct (here ,.., is the list of all variables):

(2)

The above can be easily rewritten as a Boolean combination of elementary formulas of the form   , where the ‘offset’ is a variable or an integer.

Lemma 1.

Any model for can be transformed into a model for , and vice versa.

Proof.

By definition, given an , a model for , we have  is true, and is the disjoint collection of the corresponding cells:

(3)

which implies that  .

Conversely, assume a mapping provides an evaluation which makes true. Then  is true, and, in addition, we can take a heap as the disjoint collection of the cells in accordance with (3), which provides: . ∎

Corollary 1.

Satisfiability is in .

Proof.

Follows from Lemma 1 and the fact that satisfiability for quantifier-free Presburger arithmetic belongs to  [24]. ∎

Satisfiability is shown -hard by reduction from the -colourability problem [15].

Problem 1 (-colourability).

Let be an undirected graph with  vertices . The -colourability problem is to decide if there is a -colouring of its vertices such that no two adjacent vertices share the same colour.

Definition 4.

Let be an instance graph with  vertices. We encode the perfect -colourings of  with the following symbolic heap .

We use to denote one of the colours, , , or , the vertex  is marked by.

To encode the fact that no two adjacent vertices and share the same colour, we use and as the addresses, relative to the base-offset , for two disjoint cells. To ensure that all cells allocated in question are disjoint, with , we introduce the numbers as:

(4)

Our choice is motivated, in particular, by needs of Definition 8 where its is guaranteed to be satisfiable whenever we allow memory chunks of length  to accommodate any of distinct colours used in the trivially realizable -colouring problem.

Proposition 2.

Let pairs  and   be distinct. Then  

Formally, we define to be the following quantifier-free symbolic heap:

(5)

Notice that is in minimal pointer arithmetic.

Lemma 2.

Let be an instance of the -colouring problem. Then from Definition 4 is satisfiable iff there is a perfect -colouring of .

Proof.

Any perfect -colouring of , with vertices  labelled by colours , yields a model for with a stack  defined as . The corresponding cells,  ,  are all disjoint because of Proposition 2.

Conversely, given a model for , we label each of the vertices  by the colour , providing a perfect -colouring of . ∎

Theorem 3.1.

Satisfiability is -hard, even for quantifier-free symbolic heaps in minimal pointer arithmetic.

Proof.

From Lemma 2. ∎

Corollary 2.

Satisfiability is -complete, even for quantifier-free symbolic heaps in minimal pointer arithmetic.

3.1 About the small model property

As for the size of models for symbolic heaps in Corollary 1, we establish the following small model property (that is [1], any satisfiable formula  has a model of size polynomial in the size of ) but not for full pointer arithmetic, cf. Remark 1.

Remark 1.

On the contrary, no small model property is valid whenever we allow , with being a variable.

Let be a symbolic heap of the form (here )

Then we have that  for any model of , which implies  . Thus, all models of  necessarily require (the distances between) at least a half of addresses in  to be of exponential size. ∎

In order to prove the small model property, we need a more workable specification of :

Definition 5.

Let be a symbolic heap under constraints from Theorem 3.2. Then we rewrite its (see Definition 3) as

(6)

where  is a Boolean function, and within (6) the Boolean variable  is substituted with of the form “” where is a fixed integer.

Proposition 3.

Any model for a symbolic heap

can be determined by a Boolean vector  

 such that   and the following system, , has an integer solution:

(7)
Proof.

Given a model of , we can evaluate each of the , and then calculate the appropriate  by means of the equations in (7). ∎

Definition 6.

In its turn, the system , (7), will be encoded by a constraint graph, , constructed as follows.

With each variable , we will associate the node labelled by .

In the case of , we depict the arrow from the node  to the node  and label it with .

In the case of , which means that “”, we depict the opposite arrow from the node  to the node  and label it with the number  .

To provide the connectivity we need, we will add, if necessary, a “maximum node” , with the constraint  “” for all . Cf. Figure 1.

Example 1.

Let be a symbolic heap of the form:

with its being of the form: .

Clearly, , where

In Figure 1 we show the constraint graphs for and , resp. Notice that, because of , the node is a “maximum node” in both cases.

In the case of (a), we have no solution. Namely, there is a negative cycle of the form , which provides a contradictory .

In the case of (b), the minimal weighted path from to is of the weight , which guarantees that is a model for and thereby for .

xx

Figure 1: The small model property: The constraint graphs for a symbolic heap of the form: , with its corresponding of the form .
Theorem 3.2 (“the small model property”).

Let be a satisfiable symbolic heap in minimal pointer arithmetic. Then we can find a model for  in which all values are bounded by , which it suffices to take as: , where ranges over all occurrences of numbers occurred in .

Proof.

According to Proposition 3, there is a Boolean vector   such that the corresponding system, , has a solution. Hence, the associated constraint graph, , has no negative cycles, see Definition 6 and Proposition 1.

We define our small model with the following mapping  with providing an evaluation which makes true. First we define that , for the “maximum node”  - so that   for all . Then is defined as: , where is the minimal weighted path leading from to .

E.g., in Example 1 the small model is given by , and . ∎

Remark 2.

Contrary to Remark 1, Theorem 3.2 is valid even for full pointer arithmetic, whenever we confine ourselves to the pointer terms of the form , with being a fixed base-offset, but any Boolean combinations of the elementary formulas , and , are allowed.

In addition, the corresponding polytime sub-procedures are running as the shortest paths procedures with negative weights allowed (e.g., Bellman-Ford algorithm), with providing polynomials of low degrees.

4 Entailment

We now focus on the entailment problem:   iff every model of  is also a model of .

Definition 7.

Let be a symbolic heap of the form

and be a symbolic heap of the form

both and are symbolic heaps in the minimal pointer arithmetic.

We express validity of , that is, every model of  is also a model of , by means of the formula :

(8)

where the following formula, , establishes an isomorphism between the disjoint collection of the cells:   ,  and the disjoint collection of the cells:   ,

(9)

Each of the above , , and can be easily rewritten as a Boolean combination of elementary formulas of the form   , where the ‘offset’ is a variable or an integer (in the case of minimal pointer arithmetic, is a fixed integer).

Thus our can be rewritten as:

(10)

where is a Boolean combination of elementary formulas of the form  .

Lemma 3.

Any model , which is a counter-model for , can be transformed into a model for , and vice versa.

Proof.

Similar to Lemma 1. ∎

4.1 Upper and Lower Bounds

Here we establish the following upper and lower bounds for the general quantified entailment problem. Namely,

  • For full pointer arithmetic, the entailment problem belongs to the class Presburger , by which we denote, with a quantifier-free , the class of formulas in the Presburger arithmetic of the form

    (11)
  • For minimal pointer arithmetic, the entailment problem is proved to be at least -complete, where is the second class in the polynomial time hierarchy [25].

The crucial difference between Presburger and polynomial is that for the latter all variables should be polynomially bounded.

Proposition 4.

The entailment problem with quantified and is in Presburger .

Proof.

According to Lemma 3, is valid iff the following holds:

(12)

The latter belongs to Presburger . ∎

The lower bound is the same:

Proposition 5.

Since we have allowed arbitrary Boolean combinations of the elementary formulas , and , we can simulate the class Presburger , providing Presburger hardness, even within the pure part of our language.

Remark 3.

The crucial difference between Presburger and polynomial is that for the latter all variables should be polynomially bounded. 111According to Theorem 5.1, given and , symbolic heaps in minimal pointer arithmetic, is valid if and only if within the corresponding form (10) representing (12), all are bounded by   and all by , where is defined as: , with ranging over all occurrences of these ‘offset’ numbers occurred in and . Here is a Boolean combination of the elementary formulas , and , where the ‘offset’ is a fixed integer.

4.2 Quantified minimal arithmetic: A lower bound

To prove -hardness in the quantified case for the minimal pointer arithmetic, we use the following constructions.

-round -colourability problem.

Let be an undirected graph with  vertices , and let be its leaves. The problem is to decide if every -colouring of the leaves can be extended to a -colouring of the graph, such that no two adjacent vertices share the same colour.

Definition 8.

Let be an instance graph with  vertices and  leaves. In addition to the variables  in Definition 4, to each edge we associate , representing the colour “complementary” to and .

To encode the fact that no two adjacent vertices and share the same colour, we intend to use , , and as the addresses, relative to the base-offset , for three consecutive cells within a memory chunk of length , which forces the corresponding colours, related to , , and , to form a permutation of . In order to provide a sufficient memory to accommodate the disjoint cells in question, we take the numbers as in Definition 4 to satisfy Proposition 2.

Formally, we define to be the following quantifier-free symbolic heap:

(13)

and to be the following quantified symbolic heap:

(14)

where the existentially quantified variables are all variables occurring in  that are not mentioned explicitly in .

Notice that both and are satisfiable and in minimal pointer arithmetic.

 is satisfiable because  does not impose any bounds on , so that we can use, for instance, distinct colours, which suffices to produce a perfect -colouring for any  with vertices.

Proposition 2 takes care of making the corresponding cells disjoint.

Lemma 4.

Let be a -round -colouring instance. The entailment problem   is valid iff there is a winning strategy for the perfect -colouring of , where and are the symbolic heaps given by Definition. 8.

Proof.

Suppose that there is a winning strategy such that every -colouring of the leaves can be extended to a perfect -colouring of the whole . We will prove that .

Let be a stack-heap pair satisfying .

The spatial part of yields a decomposition of  as the disjoint collection of the cells (we recall that and ):

(15)

and   .

Take the -colouring of the leaves obtained by assigning the colours to the leaves , ,…, resp.. where . According to the winning strategy, we can assign colours, denote them by , , to the rest of vertices , …, , resp., obtaining a -colouring of the whole  such that no adjacent vertices share the same colour. In addition, we mark edges by complementary to and .

We extend the stack for quantified variables in  so that for all ,

and, for each , we have . The fact that no adjacent vertices and share the same colour means that

is a permutation of

and, as a result, is also a model for :

(16)

As for the opposite direction, let . Since is satisfiable, there is a model for so that, in particular,  satisfies (15).

We will construct the required winning strategy in the following way. Assume a -colouring of the leaves be given by assigning colours, say , to the leaves , ,…, respectively. We modify our original  to a stack  by defining, for each ,

which does not change the heap , but provides

It is clear that the modified  is still a model for , and, hence, a model for . Then for some stack , which is extension of  to the existentially quantified variables in , we get  .

For each ,  , which means that, for , these represent correctly the original -colouring of the leaves.

By assigning the colours    to the rest of vertices , , …, resp. we obtain a -colouring of the whole .

The spatial part of the form (16) provides that , which results in that no adjacent vertices and share the same colours and , providing a perfect -colouring of . ∎

Theorem 4.1.

The entailment problem is -hard, even for quantifier-free satisfiable formulas and quantified satisfiable formulas , both in minimal pointer arithmetic.

Proof.

Via the -round -colourability problem, with Lemma 4. ∎

4.3 Quantifier-free Entailment

Theorem 4.2.

The entailment problem with quantifier-free  is in .

Proof.

is not valid iff the following holds:

(17)

At this point, we can follow our proof for Theorem 3.1 to show that satisfiability of (17) belongs to . ∎

Remark 4.

(Cf. Remark 1) No small model property is valid whenever we allow , with being a variable.

Let and be symbolic heaps of the form (here ), both satisfiable:

and

is not valid, but for any polynomial , there is a number  such that for all , there is no counter-model of size . ∎

Theorem 4.3 (“the small model property”).

Given and , quantifier-free symbolic heaps in minimal pointer arithmetic, suppose that is not valid. Then we can find a counter-model such that  but , in which all values are bounded by , which suffices to take as: , where ranges over all occurrences of numbers occurred in and .

Proof.

Follow the proof of Theorem 3.2. ∎

As for -hardness even for minimal pointer arithmetic, we will use a construction similar to Definition 4.

Definition 9.

Taking notations from Definition 4, we introduce a satisfiable of the form:

(18)

and a satisfiable of the form:

(19)
Lemma 5.

Let be an instance of the -colouring problem. Then  is not valid iff there is a perfect -colouring of .

Proof.

Any perfect -colouring of  yields a model for with , which implies that because of required there.

Conversely, the implication of the fact that, for some model , we have and is that is false. With the additional provides a perfect -colouring of . ∎

Theorem 4.4.

The entailment problem is -hard, even for quantifier-free satisfiable formulas and , both in minimal pointer arithmetic.

Corollary 3.

The entailment problem is -complete, even for the quantifier-free satisfiable formulas and , both in minimal pointer arithmetic.

5 Quantified entailments: The upper bound

The lower bound is given in Theorem 4.1. For the case of quantified entailments in minimal pointer arithmetic, we establish here, Theorem 5.1, an upper bound also of , as well as the small model property.

In fact we prove that the upper bound is the same, so that minimal pointer arithmetic is -complete, even for the full pointer arithmetic but with a fixed pointer offset, where we allow any Boolean combinations of the elementary formulas , and , and, in addition to the points-to formulas, we allow spatial formulas of the arrays the length of which is and lists which length is where is a fixed integer.

5.1 Entailment: A running example

Example 2.

With this example, we illustrate the crucial steps on the road to a smaller model.

Assuming, for simplicity, , let be of the form

(20)

and be of the form

(21)

Then in fact is a conjunction

(22)

and by Definition 6, we can also construct the corresponding constraint graph, , the labelled edges of which are given as follows: