A Combined Approach for Constraints over Finite Domains and Arrays

12/01/2013 ∙ by Sébastien Bardin, et al. ∙ Simula Research Lab CEA 0

Arrays are ubiquitous in the context of software verification. However, effective reasoning over arrays is still rare in CP, as local reasoning is dramatically ill-conditioned for constraints over arrays. In this paper, we propose an approach combining both global symbolic reasoning and local consistency filtering in order to solve constraint systems involving arrays (with accesses, updates and size constraints) and finite-domain constraints over their elements and indexes. Our approach, named FDCC, is based on a combination of a congruence closure algorithm for the standard theory of arrays and a CP solver over finite domains. The tricky part of the work lies in the bi-directional communication mechanism between both solvers. We identify the significant information to share, and design ways to master the communication overhead. Experiments on random instances show that FDCC solves more formulas than any portfolio combination of the two solvers taken in isolation, while overhead is kept reasonable.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 9

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Context. Constraint resolution is an emerging trend in software verification [32], either to automatically generate test inputs or formally prove some properties of a program. Program analysis involves solving so-called Verification Conditions, i.e. checking the satisfiability of a formula either by providing a solution () or showing there is none (). While most techniques are based on SMT (Satisfiability Modulo Theory), a few verification tools [7, 8, 16, 18, 21, 26] rely on Constraint Programming over Finite Domains, denoted CP(FD). CP(FD) is appealing here because it allows to reason about some fundamental aspects of programs notoriously difficult to handle, like floating-point numbers [5], bounded non-linear integer arithmetic, modular arithmetic [19, 22] or bitvectors [9]. Some experimental evaluations [9, 16] suggest that CP(FD) can be an interesting alternative to SMT for certain classes of Verification Conditions.

The problem. Yet the effective use of CP(FD) in program verification is limited by the absence of effective methods to handle complex constraints over arrays. Arrays are non-recursive data structures that can be found in most programming languages and thus, checking the satisfiability of formulas involving arrays is of primary importance in program verification. Moreover, resolution techniques for constraints involving arrays can often be leverage to constraints over data types like maps [10] and memory heaps [12].

While array accesses are handled for a long time through the Element constraint [23], array updates have been dealt with only recently [14], and in both cases the reasoning relies only on local consistency filtering. This is insufficient to handle constraints involving long chains of accesses and updates arising in program verification.

On the other hand, the theory of array is well-known in theorem proving [10, 24]. Yet, this theory cannot express size constraints over arrays nor domain constraints over elements and indexes. A standard solution is to consider a combination of two decision procedures, one for the array part and one for the index and element part, through a standard cooperation framework like the Nelson-Oppen (NO) scheme [29]. Indeed, under some theoretical conditions, NO provides a mean to build a decision procedure for a combined theory from existing decision procedures for and . Unfortunately, finite-domain constraints cannot be integrated into NO since eligible theories must have an infinite model [29].

Contributions. This paper addresses the problem of designing an efficient CP(FD) approach for solving conjunctive quantifier-free formulas combining fixed-size arrays and finite-domain constraints over indexes and elements. Our main guidelines are (1) to combine global symbolic deduction mechanisms with local consistency filtering in order to achieve better deductive power than both technique taken in isolation, (2) to keep communication overhead as low as possible, while going beyond a purely portfolio combination of the two approaches, (3) to design a combination scheme allowing to re-use any existing FD solver in a black box manner, with minimal and easy-to-implement API. Our main contributions are the following:

  • We design fdcc, an original decision procedure built upon a lightweight congruence closure algorithm for the theory of arrays, called cc in the paper, interacting with a local consistency filtering CP(FD) solver, called fd. To the best of our knowledge, it is the first collaboration scheme including a finite-domain CP solver and a Congruence Closure solver for array constraint systems. Moreover, the combination scheme, while more intrusive than NO, is still high-level. Especially, fd can be used in a black-box manner through a minimal API, and large parts of cc are standard.

  • We bring new ideas to make both solvers cooperate through bi-directional constraint exchanges and synchronisations. We identify important classes of information to be exchanged, and propose ways of doing it efficiently : on the one side, the congruence closure algorithm can send equalities, disequalities and Alldifferent constraints to fd, while on the other side, fd can deduce new equalities / disequalities from local consistency filtering and send them to cc. In order to master the communication overhead, a supervisor queries explicitly the most expensive computations, while cheaper deductions are propagated asynchronously.

  • We propose an implementation of our approach written on top of SICStus clpfd. Through experimental results on random instances, we show that fdcc systematically solves more formulas that cc and fd taken in isolation. fdcc performs even better than the best possible portfolio combination of the two solvers. Moreover, fdcc shows only a reasonable overhead over cc and fd. This is particularly interesting in a verification setting, since it means that fdcc can be clearly preferred to the standard fd-handling of arrays in any context, i.e. whether we want to solve a few complex formulas or we want to solve as many as formula in a short amount of time.

  • We discuss how the fdcc framework can handle other array-like structures of interest in software verification, namely uniform arrays, arrays with non-fixed (but bounded) size and maps. Noticeably, this can be achieved without any change to the framework, by considering only extensions of the cc and fd solvers.

Outline. The rest of the paper is organised as follows. Section 2 introduces running examples used throughout the paper. Section 3 presents a few preliminaries while Section 4 describes the theory of arrays and its standard decision procedures. Section 5 describes our technique to combine congruence closure with a finite domain constraint solver. Section 6 presents our implementation fdcc and experimental results. Section 7 describes extensions to richer array-like structures. Section 8 discusses related work. Finally, Section 9 concludes the paper.

2 Motivating examples

Prog1 Prog2 int A[100]; … int A[2]; … int e=A[i]; int f=A[j]; int e=A[i]; int f=A[j]; int g=A[k]; if (e != f && i = j) { … if (e != f && e != g && f != g) { …
Figure 1: Programs with arrays

We use the two programs of Fig. 1 as running examples. First, consider the problem of generating a test input satisfying the decision in program Prog1 of Fig. 1. This involves solving a constraint system with array accesses, namely

(1)

where is an array of variables of size , and means . A model of this constraint system written in OPL for CP Optimizer [35] did not provide us with an answer within minutes of CPU time on a standard machine. In fact, as only local consistencies are used in the underlying solver, the system cannot infer that is implied by the three first constraints. On the contrary, a SMT solver such as Z3 [28] immediately reports , using a global symbolic decision procedure for the standard theory of arrays.

Second, consider the problem of producing a test input satisfying the decision in program Prog2 of Fig. 1. It requires solving the following constraint system:

(2)

where is an array of size . A symbolic decision procedure considering only the standard theory of arrays returns (wrongly) a answer here while the formula is unsatisfiable, since and cannot take three distinct values. To address the problem, a symbolic approach for arrays must be combined with an explicit encoding of all possible values of indexes. However, this encoding is expensive, requiring to add many disjunctions (through enumeration). On this example, a CP solver over finite domains can also fail to return in a reasonable amount of time if it starts labelling on elements instead of indexes, as nothing prevents to consider constraint stores where or or : there is no global reasoning over arrays able to deduce from that .

3 Background

We describe hereafter a few theories closely related to the theory of arrays, the standard congruence closure algorithm and basis of constraint programming. We also recall a few facts about decision procedure combination. If not otherwise stated, we consider only conjunctive fragments of quantifier-free theories.

3.1 Theory of equality and theory of uninterpreted functions

A logical theory is a first-order language with a restricted set of permitted functions and predicates, together with their axiomatizations. We present here two standard theories closely related to the theory of arrays (presented in Section 4.1): the theory of equality  and the theory of uninterpreted functions .

  •  has signature , i.e., the only available predicate is (dis-)equality and no function symbol is allowed. Formulas in  are of the form , where variables are uninterpreted in the sense that they do not range over any implicit domain.

  •  extends with signature where the are function symbols. Formulas in  are of the form . Variables and functions are uninterpreted, i.e., the only assumption about any is its functional consistency (FC): 111  does not assume a free-algebra of terms (as Prolog does), allowing for example to find solutions for constraint .  can be extended with a free-algebra assumption of the form ..

While not very expressive,  and  enjoy polynomial-time satisfiability checking. Standard decision procedures are based on Congruence closure (Section 3.2). Note that allowing disjunctions makes the satisfiability problem NP-complete.

Interpreting variables. While variables are uninterpreted, it is straightforward to encode a set of constant values through introducing new variables together with the corresponding disequalities between the ’s (e.g., if and ). Adding domains to variables is more involving. Finite-domain constraints can be explicitly encoded with disjunctions ( translates into ), but the underlying satisfaction problem becomes NP-complete. For variables defined over an arbitrary theory , one has to consider the combined theory . The DPLL() framework and the Nelson-Oppen combination scheme can be used to recover decision procedures from available decision procedures over ,  and (see Section 3.3).

3.2 The congruence closure algorithm

The congruence closure algorithm aims at computing all equivalence classes of a relation over a set of terms [30]. It also provides a decision procedure for the theory . The algorithm relies on a union-find structure to represent the set of all equivalence classes. Basically, each class of equivalence has a unique witness and each term is (indirectly) linked to its witness. Adding an equality between two terms amounts to choose one term’s witness to be the witness of the other term. Disequalities inside the same equivalence class lead to . Smart handling of “witness chains” through ranking and path compression ensures very efficient implementations in . We sketch such an algorithm in Fig. 2. Each initial variable is associated with two fields: and . Initially, and . Path compression is visible at line 3 of the find procedure. Ranking optimisation amounts to compute the rank of each variable, and choose the variable with larger rank as the new witness in .

1 function : 2 3 := ; 4 := ; 5 6if  ==  then  skip ; 7; 8 else if   then 9        := ; 10        11 else if  then 12        := ; 13        14 else 15        := ; 16        := + 1 ; 17        18return ;   1 function : 2 if  then 3        := 4return (); 1 function : 2 := ; 3 := 0 ;

Figure 2: Congruence closure algorithm

The algorithm presented so far works for  and can be extended to  with only slight modification taking sub-terms into account [30]. The procedure remains polynomial-time.

3.3 Combining solvers

The Nelson-Oppen cooperation scheme (NO) allows to combine two solvers and for theories and into a solver for the combined theory . Theories and are essentially required [29] to be disjoint (they may share only the and predicates) and stably-infinite (whenever a model of a formula exists, an infinite model must exist as well). Suitable theories include , , the theory of arrays and the theory of linear (integer) arithmetic. However, finite-domain constraints do not satisfy these assumptions. Moreover, in the case of non-convex theories (including arrays and linear integer arithmetic), theory solvers must be able to propagate all implied disjunctions of equalities, which is harder than satisfiability checking [4].

The DPLL() framework [34] takes advantage of a DPLL SAT-solver in order to leverage a solver into a solver for . Propagation of implied disjunctions of equalities in NO is reduced to the propagation of implied equalities at the price of letting DPLL decides (and potentially backtracks) over all possible equalities between variables.

3.4 Contraint Programming over Finite Domains

Constraint Programming over Finite Domains, denoted CP(FD), deals with solving satisfiability or optimisation problems for constraints defined over finite-domain variables. Standard CP(FD) solvers interleave two processes for solving constraints over finite domain variables, namely local consistency filtering and labelling search. Filtering narrows the domains of possible values of variables, removing some of the values which do not participate in any solution. When no more filtering is possible, search and backtrack take place. These procedures can be seen as generalisations of the DPLL procedure.

Let be a finite set of values. A constraint satisfaction problem (CSP) over is a triplet where the domain is a finite Cartesian product , is a finite set of variables such that each variable ranges over and is a finite set of constraints such that each constraint is associated with a set of solutions . The set of solutions of is equal to . A value of participating in a solution of is called a legal value, otherwise it is said to be spurious. In other words, the set of legal values of in is defined as the i-th projection of . A propagator refines a CSP into another CSP with . A propagator is correct (or ensures correct propagation) if . The use of correct propagators ensures that no legal value is lost during propagation, which in turn ensures that no solution is lost, i.e. .

Local consistency filtering considers each constraint individually to filter the domain of each of its variables. Several local consistency properties can be used, but the most common are domain– and bound–consistency [15]. Such propagators are considered as an interesting trade-off between large pruning and fast propagation.

4 Array constraints

We present now the (pure) theory of arrays - no domain nor size constraints, two standard symbolic procedures for deciding the satisfiability of -formulas and how CP(FD) can be used to handle a variation of , adding finite domains to indexes and elements while fixing array sizes.

4.1 The theory of arrays

The theory of arrays has signature , where returns the value of array at index and returns the array obtained from by putting element at index , all other elements remaining unchanged. is typically described using the read-over-write semantics [10, 24]. Besides the standard axioms of equality, three axioms dedicated to and are considered (cf. Figure 3). Axiom FC is an instance of the classical functional consistency axiom, while RoW-1 and RoW-2 are two variations of the read-over-write principle (RoW).

(FC)   (RoW-1) (RoW-2)

Figure 3: Axioms for the theory of array

Note that by itself does not express anything about the size of arrays, and both indexes and elements are uninterpreted (no implicit domain). Moreover, the theory is non-extensional, meaning that it cannot reason on arrays themselves. For example, is permitted, while and are not. Yet, array formula are difficult to solve: the satisfiability problem for the conjunctive fragment is already NP-complete [17].

Modelling program semantics. We give here a small taste of how can be used to model the behaviour of programs with arrays. More details can be found in the literature [12]. There are two main differences between arrays found in imperative programming languages such as C and the “logical” arrays defined in . First, logical arrays have no size constraints while real-life arrays have a fixed size. The standard solution here is to combine with arithmetic constraints expressing that each or index must be smaller than the size of the array, arrays being coupled to a variable representing their size. Second, real-life arrays can be accessed beyond their bounds, leading to typical bugs. Such buggy accesses are usually not directly taken into account in the formal modelling in order to avoid the subtleties of reasoning over undefined values. The preferred approach is to add extra verification conditions asserting that all array accesses are valid, and to verify separately the program specifications (assuming all array accesses are within bounds).

4.2 Symbolic algorithms for the theory of arrays

Symbolic decision procedures for rely on the congruence closure algorithm shown above. There are two main classes of procedures [10, 24]:

  • Create a dedicated -solver through extending the congruence closure algorithm with rewriting rules inspired from the array axioms. Case-splits are required for dealing with the RoW axiom, leading to an exponential-time algorithm.

  • Rely on a -solver through encoding all operations with and if-then-else expressions (). For example, is rewritten into . The transformation introduces disjunctions, leading to an exponential-time algorithm.

4.3 Fixed-size arrays and Constraint Programming

A variant of can be dealt with in CP(FD): arrays are restricted to have a fixed and known size, while finite-domain constraints over indexes and elements are natively supported.

A logical fixed-size array is encoded explicitly in CP(FD) solvers by a fixed-size array (or sequence) of finite-domain variables. The select constraint is typically handled by constraint element [23]. The constraint holds iff , where , are finite domain variables and is a fixed-size array. Local consistency filtering algorithms are available for element at quadratic cost [13]. Filtering algorithms for constraints have been proposed in [14], with applications to software testing. The constraint can be reasoned about in CP(FD) by creating a new array of finite domain variables and using filtering algorithms based on the content of arrays. Two such filtering algorithms for and are described in Section 5, Figure 7.

Aside dedicated propagators, can also be totally removed through the introduction of reified case-splits (conditional constraints), following the method of Section 4.2. Yet, this is notoriously inefficient here because of the absence of adequate global filtering.

Terminology. In this article, we consider filtering over element as implementing local reasoning, while global reasoning refers to deduction mechanisms working on a global view of the constraint system, e.g. taking into account all /. We will also use the generic terms Access and Update to refer to any correct filtering algorithm for and over finite domains.

5 Combining cc and fd

We present here our combination procedure for handling formulas over arrays and finite-domain indexes and elements. The resulting decision procedure natively supports finite-domain constraints and combines global symbolic reasoning with local domain filtering. Moreover, we can reuse existing FD solvers in a black-box manner through a minimal API.

5.1 Overview

Our approach is based on combining symbolic global reasoning for arrays and local filtering. The framework, sketched in Fig. 4, is built over three main ingredients:

  1. local filtering for arrays plus constraints on elements and indexes, named fd,

  2. a lightweight global symbolic reasoning procedure over arrays, named cc,

  3. a new bi-directional communication mechanism between fd and cc.

Let be a conjunction of equalities, disequalities, array accesses () and updates (), constraint on the size of arrays and other (arbitrary) constraints over elements and indexes. Our procedure takes as input, and returns a verdict that can be either or . First, the formula is preprocessed and dispatched between cc and fd. More precisely, equalities and disequalities as well as array accesses and updates go to both solvers. Constraints over elements and indexes go only to fd. The two solvers exchange the following information (Fig. 4): cc can communicate new equalities and disequalities among variables to fd, as well as sets of variables being all different (i.e., cliques of disequalities); fd can also communicate new equalities and disequalities to cc, based on domain analysis of variables. The communication mechanism and the decision procedures are described more precisely in the rest of this section.

Figure 4: A bi-directional process for combining cc and fd

5.2 The cc decision procedure

We adapt the standard congruence closure algorithm into a semi-decision procedure cc for arrays. By semi-decision procedure, we mean that deductions made by the procedure are correct w.r.t. array axioms but may not be sufficient to conclude to or . cc is correct (verdict can be trusted) but not complete (may output “maybe”). For the sake of clarity, we refine the set of array axioms given in Section 4.1 into an equivalent set of six operational rules (cf. Figure 5), taking axioms and their contrapositives into account.

(FC-1)   (FC-2) (RoW-1-1) (RoW-1-2) (RoW-2-1) (RoW-2-2)

Figure 5: Rules for array axioms

We adapt the congruence closure algorithm in order to handle these six rules.

  • Rules FC-1 and FC-2 are commonly handled with slight extension of congruence closure [30], taking sub-terms into account. Each term is now equipped with two sets and denoting the sets of its direct super-terms and sub-terms.

  • To cope with rules RoW-1-1 to RoW-2-1, we add a mechanism of delayed evaluation: for each term , we put pairs , and in a watch list. Whenever the left-hand side of a pair in the watch list can be proved, we deduce that the corresponding right-hand side constraint holds.

  • For RoW-2-2, we rely on delayed evaluation, but only if term is syntactically present in the formula.

While implied disequalities are left implicit in standard congruence closure, we close the set of disequalities (through FC-2 and RoW-1-2) in order to benefit as much as possible from rules RoW-2-1 and RoW-1-2. The whole procedure is described in Figure 6. For the sake of conciseness, a few simplifications have been made: we did not include ranking optimisation of congruence closure (cf. Section3.2); the unsatisfiability check check_unsat() is performed at the end of main function cc while it could be performed on-the-fly when merging equivalence classes or adding a disequality; the watch list should be split into one list of watched pairs per equivalence class, allowing function check_wl() to iterate only over watched pairs corresponding to modified equivalence classes.

This polynomial-time procedure is clearly not complete (recall that the satisfaction problem over arrays is NP-complete) but it implements a nice trade-off between standard congruence closure (no array axiom taken into account) and full closure at exponential cost (introduction of case-splits for RoW-* rules).


global variable wl := ; // watch list, elements of the form global variable todo := ; // work list, elements of the form or
1 function cc: // is an atomic constraint 2 3todo := ; 4 5while  do 6        7       choose ; todo := todo - ; 8        9       update_wl(); 10        11       switch  do 12               13              case :  do 14                      ;                       close_eq() ;                       // update variable todo (rule FC-1) 15                      16               17              case :  do 18                      := find(); := find(); := ; := ;                       close_diff(, ) ;                       // update variable todo (rule FC-2) 19                      20               21               check_wl() ;         // update variables wl and todo (rules RoW-*) 22        23 24if check_unsat() then return UNSAT ; 25 else return OK; 26 ; 27 28end

1 function close_eq(): // elements in are pairs // representing // for a given 2 forall  do 3        todo := todo + ; 4        1 function : 2 3 := ; := ; := ; 4 := ; 5 := ; 6 := ; 7 1 function update_wl(): // is the set of all terms // seen so far 2 3forall  s.t.   do 4        5       wl := wl ; 6        wl := wl ; 7        wl := wl ; 8        9       if   then 10               wl := wl 11        12 13forall  s.t.   do 14        15       if   then 16               wl := wl 17        18 1 function check_unsat(): // iterates over all terms seen so far, // looking for contradiction 2 forall  do 3        if diff(,)  then return true; 4       ; 5        return false; function equal(t,t’): return ==; function diff(t,t’): return ;
1 function close_diff(,): // elements in are pairs // representing 2 := ; := ; 3 forall  do 4        todo := todo + ; 5        1 function : 2 if  then 3        := 4return ; 1 function check_wl(): 2 3forall  do 4        5       b:=false; 6        7       switch partial_eval() do 8               9              case true: do 10                      todo := todo + ; b:=true; 11                      12               13              case false: do  b:=true; ; 14                15              case unknown: do skip;; 16                17        18       if b then wl := wl - ; 19       ; 20        1 function partial_eval(): 2 3switch  do 4        5       case :  do 6               7              if equal(,) then r := true; 8               9              else if diff(,) then r := false; 10               11              else r; 12               := unknown ; 13               14              return r; 15               16        17       case :  do 18               19              if equal(,) then r := false; 20               21              else if diff(,) then r := true; 22               23              else r; 24               := unknown ; 25               26              return r; 27               28       


Figure 6: The cc procedure

5.3 The fd decision procedure

We use existing propagators and domains for constraints over finite domains. Our approach requires at least array constraints for operations, and support of Alldifferent constraint [31] is a plus. An overview of propagators for Access and Update is provided in Figure 7, where the propagators are written in a simple pseudo-language. I and E are variables, while A and A’ are (finite) arrays of variables. Each variable X comes with a finite domain D(X) (here a finite set). Set operations have their usual meaning, X==Y (resp. X=!=Y) makes variables X and Y equal (resp. different), integer(X)? is true iff X is instantiated to an integer value, and success indicates that the constraint is satisfied.

Access(A,I,E) : fixpoint( integer(I)? A[I] == E, success ; D(E) := D(E) D(A()) ; D(I) := ) Update(A,I,E,A’) : fixpoint( integer(I)? A’[I] == E, forall k I do A’[k] == A[k], success ; D(E) := D(E) D(A’()) ; D(I) := ; forall k D(I) do A’[k] == A[k] ; forall k D(I) do D(A’[k]) := D(A’[k]) (D(A[k]) D(E)) ; forall k D(I) do if (D(A[k]) D(A’[k]) ) then I == k
)
Figure 7: Standard implementations of constraints Access and Update

5.4 Cooperation between cc and fd 

The cooperation mechanism involves both to know which kind of information can be exchanged, and how the two solvers synchronise together. Our main contribution here is twofold: we identify interesting information to share, and we design a method to tame the communication cost.

Communication from cc to fd. Our implementation of cc maintains the set of disequalities and therefore both equalities and disequalities can be transmitted to fd. Interestingly, disequalities can be communicated through Alldifferent constraints in order to increase the deduction capabilities of fd. More precisely, any set of disequalities is captured by an undirected graph where each node is a term, and there is an edge between two terms and if and only if . Finding cliques222A clique is a subset of the vertices such that every two vertices in are connected by an edge. in the graph allows us to transmit Alldifferent constraints to fd, e.g., is communicated to fd using Alldifferent. These cliques can be sought dynamically during the execution of cc. Since finding a largest clique of a graph is NP-complete, restrictions have to be considered. Practical choices are described in Sec. 6.1.

Communication from fd to cc. fd may discover new disequalities and equalities through filtering. For example, consider the constraint with domains , and . While no more filtering can be performed333Technically speaking, the constraint system is said to be bound-consistent., we can still deduce that , and , and transmit them to cc. Note that, as cchas no special support for Alldifferent, there is no need to transmit these inequalities under the form of this global constraint in this case. Yet, this information is left implicit in the constraint store of fd and needs to be checked explicitly. But there is a quadratic number of pairs of variables, and (dis-)equalities could appear at each filtering step. Hence, the eager generation of all domain-based (dis-)equalities must be temperated in order to avoid a combinatorial explosion. We propose efficient ways of doing it hereafter.

Synchronisation mechanisms: how to tame communication costs. A purely asynchronous cooperation mechanism with systematic exchange of information between fd and cc (through suspended constraints and awakening over domain modification), as exemplified in Fig. 4, appeared to be too expensive in practise. We managed this problem through a reduction of the number of pairs of variables to consider (critical pairs, see after) and a communication policy allowing tight control over expensive communications.

1. The communication policy obeys the following principles:

  • cheap communications are made in an asynchronous manner;

  • expensive communications are made only on request, initiated by a supervisor;

  • the two solvers run asynchronously, taking messages from the supervisor;

  • the supervisor is responsible to dispatch formulas to the solvers, to ensure a consistent view of the problem between fd and cc, to forward answers of one solver to the other and to send queries for expensive computations.

It turns out that all communications from cc to fd are cheap, while communications from fd to cc are expensive. Hence, it is those communications which are made only upon request. Typically, it is up to the supervisor to explicitly ask if a given pair of variables is equal or different in fd. Hence we have a total control on this mechanism.

2. We also reduce the number of pairs of variables to be checked for (dis-)equality in fd, by focusing only on pairs whose disequality will directly lead to new deductions in cc. For this purpose, we consider pairs involved in the left-hand side of rules FC-*, RoW-1-* and RoW-2-*. Such pairs will be called critical. Considering the six deduction rules of Section 5.2, the set of critical pairs of a formula is defined as follows:

  • contains exactly all pairs , where , and appear syntactically in the formula (denoted );

  • contains exactly all pairs and for each term , plus pairs if .

  • The set of critical pairs is defined by .

The number of critical pairs is still quadratic, not in the number of variables but in the number of . We choose to focus our attention only on the second class of critical pairs, namely : they capture the specific essence of array axioms (besides FC) and their number is only linear in the number of . This restriction of critical pairs corresponds exactly to the pairs checked for equality or disequality in the WatchList of the cc procedure (Section 5.2). In practise, it appears that this reduction is manageable while still bringing interesting deductive power. A summary of the set of pairs to be considered and their number is given in Table 1.



rules
set of pairs # of pairs


no restriction



FC-*, RoW-*



FC-*


RoW-*



Table 1: Number of pairs to consider for checking (dis-)equality in fd

The labelling procedure. So far we have only considered propagation. However, while the propagated information is correct, it is not complete. Completeness is recovered through a standard labelling approach. We consider labelling in the form of . The labelling procedure constrains only fd: it appears that flooding cc with all the new (dis)-equalities at each choice point was expensive and mostly worthless. In a sense, most labelling choices do not impact cc, and those which really matter are in fine transmitted through queries about critical pairs.

Complete architecture of the approach. A detailed architecture of our approach can be found in Fig. 8. Interestingly, cc and fd do not behave in a symmetric way: cc transmits systematically to the supervisor all new deductions made and cannot be queried, while fd transmits equalities and disequalities only upon request from the supervisor. Note also that cc can only provide a definitive answer (no view of non-array constraints) while fd can provide both definitive and answers. The list of critical pairs is dynamically modified by the supervisor: pairs are added when new are deduced by cc and already proved (dis-)equal pairs are removed. In our current implementation, the supervisor queries fd on all active critical pairs at once. Querying takes place after each propagation step.

Figure 8: Detailed view of the communication mechanism

API for the CP(FD) solver. While the approach requires a dedicated implementation of the supervisor and cc (yet, most of cc is standard and easy to implement), any CP(FD) solver can be used as a black box, as long as it provides support for:

  • the atomic constraints considered in the formula (Access, Update and whatever constraints required over indexes and elements),

  • the two functions is_fd_eq(x,y) and is_fd_diff(x,y), stating if two variables can be proved equal or different. These two functions are either available or easy to implement in most CP(FD) systems. They are typically based on the available domain information, for example is_fd_diff(x,y) may return true iff . More precise (but more demanding) implementations can be used. For example, we can force an equality between x and y and observe propagation. Upon failure, we deduce that x and y must be different.

Alternative design choices. We discuss here a few alternative design solutions, and the reasons why we discarded them. We already pointed out that systematically transmitting to cc all labelling choices was inefficient (i.e. we observed a dramatic drop in performance and no advantage in solving power), since most of these choices do not lead to relevant deduction in cc. For the same reasons, it appears that transmitting to cc every instantiation obtained in fd through propagation does not help. We also experimented an asynchronous communication mechanism for critical pairs. Typically, a dedicated propagator critical-pair(X,Y) was launched each time cc found a new critical pair. The propagator awakes on modifications of or , and checks if any of is_fd_eq(x,y) or is_fd_diff(x,y) is conclusive. If yes, the propagator sends the corresponding relations to cc and successfully terminates. Again, this alternative design appears to be inefficient, the critical-pair propagators being continuously awoken for no real benefit.

5.5 Properties of the framework

Comparing fdcc with standard approaches for arrays. Table 2 gives a brief comparison of fdcc, cc and fd. Compared to a standard CP(FD) approach, the main advantage of fdcc is to add a symbolic and global deduction mechanism. Yet the approach is still limited to fixed-size arrays. Compared to a standard symbolic approach for , fdcc enables to reason natively about finite domains variables and contains FD constraints over both array elements and indexes. However, fdcc cannot deal with unknown-size arrays and cannot be easily integrated into a Nelson-Oppen combination framework.


fd cc fdcc

add FD constraints


add SMT constraints


reasoning over domains




global symbolic deduction




unknown-size arrays

Table 2: Comparison between fdcc, fd and cc

Theoretical properties of the framework. Let be a conjunctive formula over arrays and finite-domains variables and constraints. A fd propagator is correct if every filtered value does not belong to any solution of . Moreover, a correct fd propagator is strongly correct if it correctly evaluates fully-instantiated instances of the problem (i.e. the propagator distinguishes between solutions and non-solutions). We denote by fdcc-propagation and fd-propagation the propagation steps of fd and fdcc. fd-propagation is limited to domain filtering, while fdcc-propagation considers (dis-)equalities propagation as well. A decision procedure is said to be correct if both positive and negative results can be trusted, and complete if it terminates.

Theorem 5.1

Assuming that fd filtering is strongly correct, the following properties hold: (i) fdcc-propagation terminates, (ii) fdcc-propagation is correct, and (iii) fdcc is correct and complete.

Proof

Proof. (i) fd and cc can only send a bounded amount of information from one to each other: fd can send to cc a number of new (dis-)equalities in (critical pairs), and cc can send to fd a number of new (dis-)equalities in . Since each solver alone terminates, the whole fdcc-propagation step terminates. (ii) Correctness of fdcc-propagation comes directly from the correctness of the cc procedure (easily derived by comparing the deduction rules and the axioms of ) and the assumed correctness of fd-propagation. (iii) The labelling procedure ensures termination since the number of variables does not change along the resolution process (cc can deduce new terms, but no new variables). Negative results (UNSAT) can be trusted because fdcc-propagation is correct, while positive results (SAT) can be trusted because fd-propagation is strongly correct. Altogether, we deduce that fdcc is correct and complete.∎

5.6 Running examples

Consider the array formulas extracted from Fig. 1. fd solves each formula in less than sec. For Prog1, cc immediately determines that (1) is , as allows to merge and , which are declared to be different. For Prog2, in cc, the formula is not detected as being (the size constraint over being not taken into account), but rule (FC-2) produces the new disequalities , and . Then, the two cliques and are identified. In fd, the domains of are pruned to and local filtering alone cannot go further. However, when considering the cliques previously identified, two supplementary global constraints are added to the constraint store: Alldifferent and Alldifferent. The latter and the pruned domains of allow fdcc to conclude that (2) is . This example shows that it is worth supporting Alldifferent.

6 Implementation and experimental results

In order to evaluate the potential interest of the proposed approach, we developed a prototype constraint solver that combines both the cc and fd procedures. The solver was then used to check the satisfiability of large sets of randomly generated formulas and structured formulas. This section describes our tool called fdcc, and details our experimental results.

6.1 Implementation of fdcc

We developed fdcc as a constraint solver over

augmented with finite domains arithmetic. It takes as input formulas written in the above theory and classifies them as being

or . In the former case, the tool also returns a solution (i.e., a model) under the form of a complete instantiation of the variables. Formulas may include array select and store, array size declaration, variable equalities and disequalities, finite domains specifications and (both linear and non-linear) arithmetic constraints on finite domain variables.

fdcc is implemented on top of SICStus Prolog and is about 1.7 KLOC. It exploits the clpfd library [15] which provides an optimised implementation of Alldifferent as well as efficient filtering algorithms for arithmetical constraints over finite domains. The FD solver is extended with our own implementations of the array select and store operations [14]. Communication is implemented through message passing and awakenings. Alldifferent constraints are added each time a -clique is detected. Restricting clique computations to -cliques is advantageous to master the combinatorial explosion of a more general clique detection. Of course, more interesting deductions may be missed (e.g., -cliques) but we hypothesise that these cases are seldom in practise. The -clique detection is launched each time a new disequality constraint is considered in cc.

CPU runtime is measured on an Intel Pentium 2.16GHZ machine running Windows XP with 2.0GB of RAM.

6.2 Experimental evaluation on random instances

Using randomly generated formulas is advantageous for evaluating the approach, as there is no bias in the choice of problems. However, there is also a threat to validity as random formulas might not fairly represent reality. In SAT-solving, it is well known that solvers that perform well on randomly generated formulas are not necessary good on real-world problems. To mitigate the risk, we built a dedicated random generator that produces realistic instances.

Formula generation. We distinguish four different classes of formulas, depending on whether linear arithmetic constraints are present or not (in addition to array constraints) and whether array constraints are (a priori) “easy” or “hard”. Easy array constraints are built upon three arrays, two without any constraint, and the third created by two successive stores. Hard array constraints are built upon different arrays involving long chains of store (up to successive stores to define an array). The four classes are:

  • AEUF-I (easy array constraints),

  • AEUF-II (hard array constraints),

  • AEUF+LIA-I (easy array constraints plus linear arithmetic),

  • AEUF+LIA-II (hard array constraints plus linear arithmetic).

We performed two distinct experiments: in the first one we try to balance sat and unsat formulas and more or less complex-to-solve formulas by varying the formula length, around and above the complexity threshold, while in the second experiment, we regularly increase the formula length in order to cross the complexity threshold. Typically, in both experiments, small-size random formulas are often easy to prove and large-size random formulas are often easy to prove . In our examples, formula length varies from to . In addition, the following other parameters are set up: formulas contain around variables (besides arrays), arrays have size and all variables and arrays range over domain , so that enumeration alone is unlikely to be sufficient.

Properties to evaluate. We are interested in the following two aspects when comparing the solvers: (1) the ability to solve as many formulas as possible, and (2) the average computation time on easy formulas. These two properties are equally important in verification settings: solving a high ratio of formulas is of primary importance, but a solver able to solve many formulas with an important overhead may be less interesting than a faster solver missing only a few difficult-to-solve formulas.

Competitors. We submitted the formulas to three versions of fdcc. The first version is the standard fdcc described so far. The second version includes only the cc algorithm while the third version implements only the fd approach. In addition, we also use two witnesses, hybrid and best. hybrid represents a naive concurrent (black-box) combination of cc and fd: both solvers run in parallel, the first one getting an answer stops the other. best

simulates a portfolio procedure with “perfect” selection heuristics: for each formula, we simply take the best result among

cc and fd. best and hybrid are not implemented, but deduced from results of cc and fd. best serves as a reference point, representing the best possible black-box combination, while hybrid serves as witness, in order to understand if fdcc goes further in practise than just a naive black-box combination. All versions are correct and complete, allowing a fair comparison. The cc version requires that the labelling procedure communicates each (dis-)equality choice to cc in order to ensure correctness.

Results of the first experiment. For each formula, a time-out of s was positioned. We report the number of , and answers for each solver in Tab. 3.





All categories


(369 formulas)


S U TO T



cc
29 115 225 13545




fd
154 151 64 3995



fdcc
181 175 13 957



best
154 175 40 2492



hybrid
154 175 40 2609





AEUF-I AEUF-II


(79) (90)


S U TO T S U TO T



cc
26 37 16 987 2 30 58 3485




fd
39 26 14 875 35 18 37 2299



fdcc
40 37 2 144 51 30 9 635



best
39 37 3 202 35 30 25 1529



hybrid
39 37 3 242 35 30 25 1561


AEUF+LIA-I AEUF+LIA-II


(100) (100)


S U TO T S U TO T



cc
1 21 78 4689 0 27 73 4384




fd
50 47 3 199 30 60 10 622



fdcc
52 48 0 24 38 60 2 154



best
50 48 2 139 30 60 10 622



hybrid
50 48 2 159 30 60 10 647

S : # sat answer, U : # unsat answer, TO : # time-out (60 sec), T: time in sec.

Table 3: Experimental results of the first experiment

As expected for pure array formulas (AEUF-*), fd is better on the instances, and cc behaves in an opposite way. Performance of cc decreases quickly on hard-to-solve formulas. Surprisingly, the two procedures behave quite differently in presence of arithmetic constraints: we observe that formulas become often easily provable with domain arguments, explaining why fd performs better and cc worst compared to the AEUF-* case. Note that computation times reported in Tab. 3 are dominated by the number of time-outs (TO), since here solvers often quickly succeed or fail. Hence best and hybrid do not show any significant difference in computation time, while in case of success, best is systematically 2x-faster than hybrid. Results show that:

  • fdcc solves strictly more formulas than fd or cc taken in isolation, and even more formula than best. Especially, there are formulas solved only by fdcc, and fdcc shows 5x-less TO than fd and 3x-less TO than best.

  • fdcc yields only a very affordable overhead over cc and fd when they succeed. fdcc is at worst 4x-slower than cc, fd and best when they succeed. On average it is 1.5x-slower (resp. 1.1x-slower) than cc and fd (resp. best) when they succeed.

  • These results hold for the four classes of programs, for both and instances, and a priori easy or hard instances. Hence, fdcc is much more robust than fd or cc.

Results of the second experiment. In this experiment, formulas of class AEUF-II are generated with length , varying from to . While crossing the complexity threshold, we record the number of time-outs (TO, positioned at sec). In addition, we used two metrics to evaluate the capabilities of fdcc to solve formulas, Gain and Miracle, defined as follows:

  • Gain: each time fdcc classifies a formula that none of (resp. only one of) cc and fd can classify, Gain is rewarded by (resp. ); each time fdcc cannot classify a formula that one of (resp. both) cc and fd can classify, Gain is penalised by (resp. ). Note that the case never happened during our experiments.

  • Miracle is the number of times fdcc gives a result when both cc and fd fail.

Fig. 9 shows the number of solved formulas for each solver, the number of formulas which remain unsolved because of time-out, and both the values of Gain and Miracle.

Figure 9: Experimental results for the experiment

We see that the number of solved formulas is always greater for fdcc (about 20% more than fd and about 70% more than cc). Moreover, fdcc presents maximal benefit for formula length in between and , i.e. for a length close to the complexity threshold, meaning that the relative performance is better on hard-to-solve formulas. For these lengths, the number of unsolved formulas is always less than with fdcc, while it is always greater than with both cc and fd.

Conclusion. Experimental results show that fdcc performs better than fd and cc taken in isolation, especially on hard-to-solve formulas, and is very competitive with portfolio approaches mixing fd and cc. More precisely,

  • fdcc solves strictly more formulas than its competitors (3x-less TO than best) and shows a low overhead over its competitors (1.1x-average ratio when best succeeds).

  • relative performance is better on hard-to-solve formulas than on easy-to-solve formulas, suggesting that it becomes especially worthwhile to combine global symbolic reasoning with local filtering when hard instances have to be solved.

  • fdcc is both reliable and robust on the class of considered formulas ( or , easy-to-solve or hard-to-solve).

This is particularly interesting in verification settings, since it means that fdcc is clearly preferable to the standard fd-handling of arrays in any context, i.e., whether we want to solve a few complex formulas or to solve as many as formula in a small amount of time.

7 Extensions of the core technique

In this section, we discuss several extensions of fdcc. We focus on extensions of relevant to software verification. Interestingly, the combination framework can be reused without any modification, only the cc or fd solvers must be extended.

7.1 Uniform arrays

Many programming languages offer the developer to initialise arrays with the same constant value, typically , or the same general expression. Dealing efficiently with constant-value initialisation is necessary in any concrete implementation of a software verification framework. In order to capture this specific data structure, we add at the formula level an array term of the form , where represents a term. For these arrays, called uniform arrays, we introduce the following extra rule: .

Uniform arrays can be handled in fdcc as follows: (i) add a new rule in cc rewriting into , (ii) in fd, either unfold each array and fill it with variables equal to , or (preferably) add a special kind of “folded” array such that Access always returns and Update creates an unfolded version filled with terms.

7.2 Array extensionality

Software verification over array programs sometimes involves (dis-)equalities over whole arrays. For example, programs that perform string comparison often include string-level primitives. For this purpose, formulas can be extended with equality and disequality predicates over arrays, denoted and in the extensional theory of arrays [1].

Array equality can be directly handled by congruence-closure on array names in cc and by index-wise unification of arrays in fd. When checking satisfiability of quantifier-free formulas, any array disequality can be replaced by a standard disequality , where is a fresh variable. This preprocessing-based solution is sufficient for both cc and fd. Yet, implementing a dedicated constraint for array disequality can lead to better propagation. Such a constraint is described in Figure 10.

Diff-array(A,I,A’) :- fixpoint(



integer(I)? A =!= A’, success ;
D(I) := D(I) )
Figure 10: CP(FD) constraint for array disequality

We provide a small example illustrating the advantage of the Diff-array constraint over introducing a fresh variable such that . Let us consider two arrays and with constant size . Moreover, let us assume that for all , . Constraint Diff-array(,,) immediately returns since is reduced to by the second rule. On the other hand, Access constraint for propagates . From this point, no more propagation is feasible through the constraint, especially is not reduced at all. In that case, can be proved only after enumerating the whole domain of ( values).

7.3 Arrays with non-fixed (but bounded) size

We have assumed so far that arrays have a known fixed size. However, practical software verification also involves arrays of unknown size, for example in unit-level verification. We propose the following scheme for extending our approach to arrays with non-fixed (but bounded) size. Formulas are extended with a new function , and every or index over an array is constrained to be less or equal to . Moreover, we assume that each term has a known upper-bound.

This extension does not modify the cc part of the framework, since already considers unbounded arrays. On the other hand, the filtering algorithms associated to constraints over arrays must be significantly revised. We take inspiration from previous work of one of the authors [14], describing an Update constraint for memory heaps whose sizes are a priori unknown. In this work, memory heaps can be either closed or unclosed. We adapt this notion to arrays: closing an array comes down to fixing its size to a constant. As a result, the filtering algorithm is parametrised with the state of the array and deductions may be different whether the array is closed or unclosed. The closed case reduces to standard array filtering (Figure 7). The unclosed case is significantly different: unclosed arrays have a non-fixed size and only part of their elements are explicitly represented. They can be encoded internally by partial maps from integers to logical variables. Filtering is rather limited in that case, but as soon as the array gets closed, more deductions can be reached.

We present a simple implementation of constraints over unclosed arrays in Figure 11, finer propagation can be derived from ideas developed in [14]. Propagators for Access-unclosed and Update-unclosed mostly look like their counterparts over closed arrays. Note the use of operations ?D(A[k]) and merge(A,k,X) - where A is an array, k and X a logical variable - instead of D(A[k]) and A[k] == X in the case of closed arrays. These two new operations account for the case where no pair (k,Y) is recorded so far in A. In that case, ?D(A[k]) returns (the whole set of possible values for elements) and merge(A,k,X) adds the pair (k,X) to the set of explicitly described elements of A. We suppose we are given a function is-def(A,k) to test if index k and its corresponding element are explicitly stored in A. Finally, the fill operation ensures that all pairs of an array recognised as closed will be explicitly represented.

Access-unclosed(A,I,E) : fixpoint( closed(A)? Access(A,I,E), success ; integer(I)? merge(A,i,E), success ; D(E) := D(E) ?D(A()) ; D(I) :=
)

————–

closed(A): integer()? fill(A), success


fill(A): forall do: merge(A,i,N), with N fresh



?D(A[k]): if is-def(A,k) then D(A[k]) else


merge(A,k,E): if is-def(A,k) then A[k] == E else A := A[k E]

————–

Update-unclosed(A,I,E,A’) :
fixpoint( closed(A) and closed(A’)? Update(A,I,E,A’), success ; closed(A) or closed(A’)? == ; integer(I)? merge(A’,I,E) ; D(E) := D(E) ?D(A’()) ; D(I) := ; forall k [1 .. max()] D(I) do: if is-def(A,k) then merge(A’,k,A[k]), if is-def(A’,k) then merge(A,k,A’[k]) ; forall k D(I) s.t. is-def(A’,k) do: D(A’[k]) := D(A’[k]) (?D(A[k]) D(E)) ; forall k D(I) do: if (?D(A[k]) ?D(A’[k]) ) then I == i
)
Figure 11: Implementation of CP(FD) Constraints for arrays of unknown size

7.4 Maps

Maps extend arrays in two crucial ways: indexes (“keys”) do not have to be integers, and they can be both added and removed. General indexes open the door to constraints over hashmaps, which are useful in many application areas, while removable indexes are essential to model memory-heaps with dynamic (de-)allocation [12, 14].

Maps come with the , and functions, plus functions (remove a key and its associated entry from the map) and , true iff index is mapped in (we sometimes denote as a predicate). The semantics is given by the set of axioms given in Figure 12, inspired from [10, Chap. 11] 444We add the KoW-2 and KoD-2 axioms that are missing in the first edition of the book. The authors acknowledge the error on the book’s website..

(FC)   (RoW-1) (RoW-2’) (RoD-1) (KoW-1) (KoW-2) (KoD-1) (KoD-2)

Figure 12: Axioms for the theory of maps

Interestingly, maps without size constraints can be encoded into pure arrays [10] using two arrays and for each map . Array models the fact that a key is mapped in (value 1) or not (value 0), array represents the relationship between mapped keys and their associated values in . The encoding works as follows:

  • becomes ,

  • becomes ,

  • becomes ,

  • becomes ,

  • becomes .

For the fd part, Charreteur et al. [14] provides dedicated propagators in the flavor of those presented in Section 7.3. There is yet a noticeable difference with the case of non-fixed size arrays: the absence of relationship between the size of a map (i.e., its number of mapped keys) and the value of its indexes. It implies for example that map closeness is not enforced through labelling on the size, but directly through labelling on the “closeness status”, either setting it to true (no more unknown elements in the map) or keeping it to false but adding a fresh variable to a yet unmapped index value.

8 Related work

This paper is an extension of a preliminary version presented at CPAIOR 2012 [6]. It contains detailed descriptions and explanations on the core technology, formulated in complete revisions of Sections 3 to 5. It also presents new developments and extensions in a completely new Section 7. Moreover, as it discusses adaptations of the approach for several extensions of the theory of arrays relevant to software verification, it also contains a deeper and updated description of related work (Section 8).

Alternative approaches to FDCC. We sketch three alternative methods for handling array constraints over finite domains, and we argue why we do not choose them. First, one could think of embedding a CP(FD) solver in a SMT solver, as one theory solver among others, the array constraints being handled by a dedicated solver. As already stated in introduction, standard cooperation framework like Nelson-Oppen (NO) [29] require that supported theories have an infinite model, which is not the case for Finite Domains.

Second, one could simply use a simple concurrent black-box combination (first solver to succeed wins). Our greybox combination scheme is more complex (yet still rather simple), but performance is much higher as demonstrated by our experiments. Moreover, we are still able to easily reuse existing CP(FD) engines thanks to a small easy-to-provide API.

Third, one could encode all finite-domain constraints into boolean constraints and use a SMT solver equipped with a decision procedure for the standard theory of arrays. Doing so, we give away the possibility of taking advantage of the high-level structure of the initial formula. Recent works on finite but hard-to-reason-about constraints, such as floating-point arithmetic

[5], modular arithmetic [19] or bitvectors [9], suggests that it can be much more efficient in some cases to keep the high-level view of the formula.

Deductive methods and SMT frameworks. It is well known in the SMT community that solving formulas over arrays and integer arithmetic in an efficient way through NO is difficult. Indeed, handling non-convex theories in a correct way requires to propagate all implied disjunctions of equalities, which may be much more expensive than satisfiability checking [4]. Delayed theory combination [2, 4] requires only the propagation of implied equalities, at the price of adding new boolean variables for all potential equalities between variables. Model-based theory combination [27] aims at mitigating this potential overhead through lazy propagation of equalities.

Besides, is hard to solve by itself. Standard symbolic approaches have already been sketched in Section 4.2. The most efficient approaches combine preprocessing for removing as many RoW terms as possible with “delayed” inlining of array axioms for the remaining RoW terms. New lemmas corresponding roughly to critical pairs can be added on-demand to the DPLL top-level [11], or they can be incrementally discovered through an abstraction-refinement scheme [1]. Additional performance can be obtained through frugal ( minimal) instantiation of array axioms [20].

Filtering-based methods. Consistency-based filtering approaches for array constraints are already discussed in Section 4.3. A logical combination of Element constraints (with disjunctions) can express Update constraints. However, a dedicated Update constraint, billed as a global constraint, implements more global reasoning and is definitely more efficient in case of non-constant indexes. The work of Beldiceanu et al. [3] has shown that it is possible to capture global state of several Element constraints with a finite-state automaton. This approach could be followed as well to capture Update constraint, but we do not foresee its usage for implementing global reasoning over a chain of Access and Update. Indeed, this would require the design of a complex automaton dedicated to each problem. Based on a cc algorithm, our approach captures a global state of a set of Access and Update constraints but it is also only symbolic and thus less effective than using dedicated constraints. In our framework, the cc algorithm cannot prune the domain of index or indexed variables. In fact, our proposition has more similarities with the proposition of Nieuwenhuis on his DPLL(Alldifferent) framework555http://www.lsi.upc.edu/ roberto/papers/CP2010slides.pdf, where the idea is to benefit from the efficiency of several global constraints in the DPLL algorithm for SAT encoded problems. In fdcc, we derive Alldifferent global constraints from the congruence closure algorithm for similar reasons. Nevertheless, our combined approach is fully automated, which is a keypoint to address array constraint systems coming from various software verification problems.

Combination of propagators in CP. Several possibilities can be considered to implement constraint propagation when multiple propagators are available [33]. First, an external solver can be embedded as a new global constraint in fd, as done for example on the Quad global constraint for continuous domains [25]. This approach offers global reasoning over the constraint store. However, it requires fine control over the awakening mechanism of the new global constraint. A second approach consists in calling both solvers in a concurrent way. Each of them is launched on a distinct thread, and both threads prune a common constraint store that serves of blackboard. This approach has been successfully implemented in Oz [36]. The difficulty is to identify which information must be shared, and to do it efficiently. A third approach consists in building a master-slave combination process where one of the solvers (here cc) drives the computation and call the other (fd). The difficulty here is to understand when the master must call the slave. We follow mainly the second approach, however a third agent (the supervisor) acts as a lightweight master over cc and fd to synchronise both solvers through queries.

9 Conclusions and perspectives

This article describes an approach for solving conjunctive quantifier-free formulas combining arrays and finite-domain constraints over indexes and elements. We sketch an original decision procedure that combines ideas from symbolic reasoning and finite-domain constraint solving for array formulas. The communication mechanism proposed in the article lies on the opportunity of improving the deductive capabilities of the congruence closure algorithm with finite domains information. We also propose ways of keeping communication overhead tractable. According to our knowledge, this is the first time such a combination framework at the interface of CP and SMT is proposed and implemented into a concrete prototype. Experiments show that our approach performs better than any portfolio combination of a symbolic solver and a filtering-based solver. Especially, our procedure enhances greatly the deductive power of standard CP(FD) approaches for arrays. Future works include integrating fdcc into an existing software verification tool (e.g., [8, 21]) in order to improve its efficiency over programs with arrays.

References

  • [1] Robert Brummayer and Armin Biere. Lemmas on demand for the extensional theory of arrays. In Proc. of SMT ’08/BPR ’08 Workshsop, pp. 6–11. ACM, 2008.
  • [2] Marco Bozzano, Roberto Bruttomesso, Alessandro Cimatti, Tommi A. Junttila, Silvio Ranise, Peter van Rossum, Roberto Sebastiani. Efficient Satisfiability Modulo Theories via Delayed Theory Combination. In Proc. of Computer Aided Verification (CAV’05), LNCS vol. 3576, Springer, 2005.
  • [3] Nicolas Beldiceanu, Mats Carlsson, Romuald Debruyne, and Thierry Petit. Reformulation of global constraints based on constraints checkers. In Constraints Journal, vol. 10, pp. 339–362, Oct. 2005.
  • [4] Roberto Bruttomesso, Alessandro Cimatti, Anders Franzén, Alberto Griggio, Roberto Sebastiani. Delayed theory combination vs. Nelson-Oppen for satisfiability modulo theories: a comparative analysis. In Annals of Math. Art. Int., vol. 55 (1-2), 2009.
  • [5] Roberto Bagnara, Matthieu Carlier, Roberta Gori, Arnaud Gotlieb. Symbolic Path-Oriented Test Data Generation for Floating-Point Programs In ICST 2013. IEEE, 2013
  • [6] Sébastien Bardin and Arnaud Gotlieb. FDCC: a Combined Approach for Solving Constraints over Finite Domains and Arrays. In Proc. Constraint Prog. Art. Int. Op. Res. (CPAIOR’12). Springer, 2012
  • [7] Sébastien Bardin and Philippe Herrmann. Structural testing of executables. In Proc. of Int. Conf. on Software Testing, Verification and Validation (ICST’08), pages 22–31, Lillehammer, Norway, Apr. 2008.
  • [8] Sebastien Bardin and Philippe Herrmann. OSMOSE: Automatic Structural Testing of Executables. In Journal of Software Testing, Verification and Reliability (STVR), vol. 21(1), 2011
  • [9] Sebastien Bardin, Philippe Herrmann and Florian Perroud.

    An alternative to sat-based approaches for bit-vectors.

    In Proc. of Tools and Algorithms for the Construction and Analysis (TACAS’10).
  • [10] Aaron R. Bradley, Zohar Manna. The Calculus of Computation. Springer, 2007.
  • [11] Clark Barrett, Robert Nieuwenhuis, Albert Oliveras, Cesare Tinelli. Splitting on demand in SAT Modulo Theories. In LPAR 2006. Springer, 2006
  • [12] Richard Bornat. Proving Pointer Programs in Hoare Logic. In Proc. of Mathematics of Program Construction (MPC’00), Springer, Ponte de Lima, Portugal, Jul., 2000.
  • [13] Sebastian Brand. Constraint propagation in presence of arrays. In Computing Research Repository, 6th Workshop of the ERCIM Working Group on Constraints, 2001.
  • [14] Florence Charreteur, Bernard Botella and Arnaud Gotlieb. Modelling dynamic memory management in constraint-based testing. In Journal of Systems and Software, 82(11):1755–1766, Nov., 2009.
  • [15] Mats Carlsson, Greger Ottosson, and Bjørn Carlson. An open–ended finite domain constraint solver. In Proc. of Programming Languages: Implementations, Logics, and Programs (PLILP’97), 1997.
  • [16] Helene Collavizza, Michel Rueher and Pascal Van Hentenryck. Cpbpv: A constraint-programming framework for bounded program verification. In CP 2008. Springer, 2008
  • [17] Peter J. Downey and Ravi Sethi. Assignment commands with array references. In JACM, vol25, 1978
  • [18] Arnaud Gotlieb, Bernard Botella, and Michel Rueher. A clp framework for computing structural test data. In Proc. of Computational Logic (CL’2000), LNAI 1891, pp. 399–413, London, UK, Jul., 2000.
  • [19] Arnaud Gotlieb, Michel Leconte, and Bruno Marre. Constraint solving on modular integers. In Proc. of Workshop on Constraint Modelling and Reformulation (ModRef’10), St Andrews, Scotland, Sep., 2010.
  • [20] Amit Goel, Sava Krstić, Alexander Fuchs. Deciding Array Formulas with Frugal Axiom Instantiation. In Proc. of SMT ’08/BPR ’08 Workshop, pp. 6–11, ACM, 2008.
  • [21] Arnaud Gotlieb. Euclide: A Constraint-Based Testing Framework for Critical C Programs. In Proc. of Int. Conf. on Software Testing, Verification and Validation (ICST’09), Denver, CO, USA, Apr., 2009
  • [22] G. Gange and H. Søndergaard and P. J. Stuckey and P. Schachte. Solving Difference Constraints over Modular Arithmetic. In CADE 2013. Springer, 2013
  • [23] Pascal Van Hentenryck and Jean-Philippe Carillon. Generality versus specificity: An experience with ai and or techniques. In Proc. of

    National Conference on Artificial Intelligence (AAAI’88)

    , MIT Press, Saint Paul, USA, pp. 660–664, Aug., 1988.
  • [24] Daniel Kroening and Ofer Strichman. Decision Procedures: An Algorithmic Point of View. Springer.
  • [25] Yahia Lebbah, Claude Michel, Michel Rueher, and David Daney. Efficient and safe global constraints for handling numerical constraint systems. In SIAM Journal of Numerical Analysis, vol. 42, 2005.
  • [26] Bruno Marre and Benjamin Blanc. Test selection strategies for lustre descriptions in gatel. In Electronic Notes in Theoretical Computer Science, vol. 111, pp. 93–111, 2005.
  • [27] Leonardo de Moura and Nikolaj Bjørner. Model-based theory combination. In Electronic Notes on Theor. Comput. Sci., vol. 198, num. 2, pp. 37–49, 2008.
  • [28] Leonardo De Moura and Nikolaj Bjørner. Z3: an efficient smt solver. In Proc. of Tools and Alg. for the Construction and Analysis of Systems (TACAS’08), pp. 337–340, Springer, 2008.
  • [29] Greg Nelson and Derek C. Oppen. Simplification by cooperating decision procedures. In ACM Trans. Program. Lang. Syst., vol 1, pp. 245–257, Oct., 1979.
  • [30] Greg Nelson and Derek C. Oppen. Fast decision procedures based on congruence closure. In Journal of ACM, vol. 27, num. 2, pp. 356–364, 1980.
  • [31] Jean-Charles Régin. A filtering algorithm for constraints of difference in csps. In Proc. of National Conference on Artificial Intelligence (AAAI’94), pp. 362–367,Seattle, WA, USA, Aug., 1994.
  • [32] John Rushby. Verified software: Theories, tools, experiments. Automated Test Generation and Verified Software. pp. 161–172, Springer-Verlag, 2008.
  • [33] Christian Schulte and Peter J. Stuckey. Efficient constraint propagation engines. In Transactions on Programming Languages and Systems, vol. 31, num. 1, pp. 2–43, Dec., 2008.
  • [34] Cesare Tinelli. A DPLL-Based Calculus for Ground Satisfiability Modulo Theories. In Proc. of European Conference on Logics in Artifical Intelligence (JELIA’02), Cosenza, Italy, Sep., 2002.
  • [35] P. Van Hentenryck. The OPL optimization programming language. MIT Press, 1999.
  • [36] Peter Van Roy, Per Brand, Denys Duchier, Seif Haridi, Martin Henz, Christian Schulte. Logic programming in the context of multiparadigm programming: the Oz experience. In Theory and Practice of Logic Programming, vol. 3, num. 6, pp. 715–763, Nov., 2003.