Monadic Decomposition in Integer Linear Arithmetic (Technical Report)

04/26/2020 ∙ by Matthew Hague, et al. ∙ 0

Monadic decomposability is a notion of variable independence, which asks whether a given formula in a first-order theory is expressible as a Boolean combination of monadic predicates in the theory. Recently, Veanes et al. showed the usefulness of monadic decomposability in the context of SMT (i.e. the input formula is quantifier-free), and found various interesting applications including string analysis. However, checking monadic decomposability is undecidable in general. Decidability for certain theories is known (e.g. Presburger Arithmetic, Tarski's Real-Closed Field), but there are very few results regarding their computational complexity. In this paper, we study monadic decomposability of integer linear arithmetic in the setting of SMT. We show that this decision problem is coNP-complete and, when monadically decomposable, a formula admits a decomposition of exponential size in the worst case. We provide a new application of our results to string constraint solving with length constraints. We then extend our results to variadic decomposability, where predicates could admit multiple free variables (in contrast to monadic decomposability). Finally, we give an application to quantifier elimination in integer linear arithmetic where the variables in a block of quantifiers, if independent, could be eliminated with an exponential (instead of the standard doubly exponential) blow-up.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

0.1 Introduction

A formula in some theory is monadically decomposable if it is -equivalent to a Boolean combination of monadic predicates in , i.e., to a monadic decomposition of . Monadic decomposability measures how tightly the free variables in are coupled. For example, is not monadically decomposable in any (finitary) logic over an infinite domain, but can be decomposed, in Presburger arithmetic over natural numbers, since it can be written as .

Veanes et al. [27] initiated the study of monadic decomposability in the setting of Satisfiability Modulo Theories, wherein formulas are required to be quantifier-free. Monadic decomposability has many applications, including symbolic transducers [12] and string analysis [27]. Although the problem was shown to be in general undecidable, a generic semi-algorithm for outputting monadic decompositions (if decomposable) was provided. A termination check could in fact be added if the input formula belongs to a theory for which monadic decomposability is decidable, e.g., linear arithmetic, Tarski’s Real-Closed Field, and the theory of uninterpreted functions. Hitherto, not much is known about the computational complexity of monadic decomposability problems for many first-order theories (in particular, quantifier-free theories), and about practical algorithms. This was an open problem raised by Veanes et al. in [27].

Monadic decomposability is intimately connected to the variable partition problem, first studied by Libkin [21] nearly 20 years ago. In particular, a monadic decomposition gives rise to a partition of the free variables of a formula , wherein each part consists of a single variable. More precisely, take a partition of into sets of variables, with linearizations . The formula is -decomposable (in some theory ) if it is -equivalent to a boolean combination of formulas of the form . As suggested in [21], such variadic decompositions of have potential applications in optimization of database query processing and quantifier elimination. The author gave a general condition for the decidability of variable independence in first-order theories. This result is unfortunately not easily applicable in the SMT setting for at least two reasons: (i) the full first-order theory might be undecidable (e.g. theory of uninterpreted functions), and (ii) even for a first-order theory that admits decidable monadic decompositions, the complexity of the algorithm obtained from [21] could be too prohibitive for the quantifier-free fragment. One example that epitomizes (ii) is the problem of determining whether a given relation over strings represented by a regular transducer could be expressed as a boolean combination of monadic predicates. The result of [21] would give a double exponential-time algorithm for monadic decomposability, whereas it was recently shown in [5] to be solvable in polynomial-time (resp. polynomial-space) when the transducer is given as a deterministic (resp. nondeterministic) machine.

Contributions.

First, we determine the complexity of deciding monadic decomposability and outputting monadic decompositions (if they exist) for the theory of integer linear arithmetic in the setting of SMT. Our result is summarized in Theorem 0.1.1.

Theorem 0.1.1 (Monadic Decomposability).

Given a quantifer-free formula of Presburger Arithmetic, it is coNP-complete to decide if is monadically decomposable. This is efficiently reducible to unsatisfiability of quantifier-free Presburger formulas. Moreover, if a decomposition exists, it can be constructed in exponential time.

We show a new application of monadic decomposability in integer linear arithmetic for SMT over strings, which is currently a very active research area, e.g., see [1, 10, 25, 20, 26, 2, 6, 3, 17, 22, 13]. One problem that makes string constraint solving difficult is the presence of additional length constraints, which forces the lengths of the strings in the solutions to satisfy certain linear arithmetic constraints. Whereas satisfiability of string equations with regular constraints is PSPACE-complete (e.g. see [18, 14]), it is a long-standing open problem [15, 8] whether word equations with length constraints are decidable. Length constraints are omnipresent in Kaluza [25], arguably the first serious string constraint benchmarks obtained from real-world JavaScript applications. Using our monadic decomposability solver, we show that 90% of the Kaluza benchmarks are in fact in a decidable fragment of string constraints, since occurring length constraints can be completely removed by means of decomposition.

Next we extend our result to variadic decomposability (cf. [21]).

Theorem 0.1.2 (Variadic Decomposability).

It is coNP-complete to decide if is -decomposable, given a quantifer-free formula of Presburger Arithmetic and a partition of . This is efficiently reducible to unsatisfiability of quantifier-free Presburger formulas. Moreover, if a decomposition exists, it can be constructed in exponential time.

We show how this could be applied to quantifier elimination. In particular, we show that if a formula , where is quantifier-free, is -decomposable—where and are linearizations of the variables in and —then we can compute in exponential time a formula such that , i.e., avoiding the standard double-exponential blow-up (cf. [28]).

Organization.

Preliminaries are in Section 0.2. Results on monadic (resp. variadic) decomposition are in Section 0.3 (resp. Section 0.4) and applications appear in Section 0.5.

0.2 Preliminaries

0.2.1 Presburger Syntax

In this paper we study the problem of monadic decomposition for formulas in linear integer arithmetic. All of our results are presented for Presburger arithmetic over natural numbers, but they can be adapted easily to all integers.

Definition 0.2.1 (Fragments of Presburger Arithmetic).

A formula of Presburger arithmetic is a formula of the form where and is a quantifier-free Presburger formula:

where , with , variables range over , and . The operator denotes equality modulo , i.e., whenever is a multiple of . Formulas of the shape , , or are called atoms.

Existential Presburger formulas are formulas of the form for some quantifier-free Presburger formula . We let (resp. ) denote the set of all quantifier-free (resp., existential) Presburger formulas.

Let be a tuple of integer variables. We write for a linear sum over . Let . By slight abuse of notation, we may also write to denote a formula over the variables .

0.2.2 Monadic Decomposability

A quantifier-free formula  is called monadic if every atom in contains at most one variable, and it is called monadically decomposable if is equivalent to a monadic formula . In this case, is also called a decomposition of . For our main results we use a slightly refined notion of a formula being decomposable:

Definition 0.2.2 (Monadically Decomposable on ).

Fix a logic (e.g. or ). We say a formula in is monadically decomposable on whenever

for some formulas and in .

It can be observed that a formula is monadically decomposable if and only if it is monadically decomposable on all variables occurring in the formula (cf. Lemma 0.2.5). We expand on this for the variadic case below.

We recall the following characterization of monadic decomposability for formulas with two free variables (cf. [9, 27, 5, 21]), which holds regardless of the theory under consideration. This can be extended easily to formulas with variables, but is not needed in this paper. Given a formula , define the formula as follows:

Proposition 0.2.3.

The relation is an equivalence relation. Furthermore, is monadically decomposable iff has a finite index (i.e. the number of -equivalence classes is finite).

Using this proposition, it is easy to show that over a structure with an infinite domain (e.g. integer linear arithmetic) the formula is not monadically decomposable. As was noted already in [21], to check monadic decomposability of a formula  in Presburger Arithmetic in general, we may simply check if there is an upper bound on the smallest representation of every -equivalence class, i.e.,

However, to derive tight complexity bounds for checking monadic decomposability, this approach is problematic, since the above characterisation has multiple quantifier alternations. Using known results (e.g. [16]), one would only obtain an upper bound in the weak exponential hierarchy [16], which only admits double-exponential time algorithms.

0.2.3 Variadic Decomposability

The notion of a variadic decomposition generalises monadic decomposition by considering partitions of the occurring variables.

Definition 0.2.4 (-Decomposable).

Fix a logic (e.g. or ). Take a formula in and a partition of . We say is -decomposable whenever

for some formulas in and linearizations of .

Observe that a formula is monadically decomposable on iff it is -decomposable with . Moreover, we say a formula over the set of variables is variadic decomposable on whenever it is -decomposable with .

General -decompositions can be computed by decomposing on binary partitions , which is why we focus on this binary case in the rest of the paper. We argue why this is the case below.

Let a formula  and be given. We can first decompose separately on each where . Using the algorithm in Section 0.4 we obtain for each a decomposition of a specific form:

Note, these decompositions can be performed independently using the algorithm in Section 0.4 and the second conjunct of each disjunct is with replaced by fixed constants . Additionally, each is polynomial in size and each can be represented with polynomially many bits. We note also that our algorithm ensures that each is satisfiable.

Given such decompositions, we can recursively decompose on . We first use the above decomposition for and obtain

Next, we use the decomposition for to decompose the copies of in the decomposition above. We obtain

This process repeats until all have been considered. If is -decomposable, we find a decomposition. If is not -decomposable, then it would not be possible to do the independent decompositions for each . Thus, for , we can use variadic decompositions on to compute -decompositions.

The above algorithm runs in exponential time due both to the exponential size of the decompositions and the branching caused by the disjuncts. If we are only interested in whether a formula is -decomposable, it is enough to ask whether it is decomposable on for each . In particular, a formula is monadically decomposable iff is decomposable for each variable . Since the complexity class coNP is closed under intersection, we obtain the following:

Lemma 0.2.5.

A coNP upper bound for monadic decomposability on a given variable implies a coNP upper bound for monadic decomposability. Likewise, a coNP upper bound for variadic decomposability on a given subset of variables implies a coNP upper bound for -decomposability.

0.2.4 Example

Consider the formula given by . This formula is monadically decomposable, which means, it is -decomposable for .

Our algorithm will first take a decomposition on and might obtain where and . Next, we use a decomposition on . For each we substitute , and as the final decomposition we get

0.3 Monadic Decomposability

0.3.1 Lower Bounds

We first show that unsatisfiability of Boolean formulas can be reduced to monadic decomposability of formulas with only two variables, directly implying coNP-hardness:

Lemma 0.3.1 (coNP-Hardness).

Deciding whether a formula  in is monadic decomposable is coNP-hard.

Proof.

We reduce from unsatisfiability of propositional formulas to monadic decomposability of . Take a propositional formula . Let be the first primes. Let be the formula obtained from by replacing each occurrence of by . Given an assignment , we let

Thanks to the Chinese Remainder Theorem, is non-empty and periodic with period , which implies that is infinite for every . We also have that iff, for each , is true.

Now define . If is unsatisfiable, then is unsatisfiable and so it is decomposable. Conversely, if can be satisfied by some assignment , then is true for all (infinitely many) . Since all solutions to imply that , by Proposition 0.2.3 we have that is not monadically decomposable. ∎∎

We next provide exponential lower bounds for decompositions in either disjunctive normal form (DNF) or conjunctive normal form (CNF). DNF has been frequently used to represent monadic decompositions by previous papers (e.g. [21, 5, 9]), and it is most suitable for applications in quantifier elimination.

Lemma 0.3.2 (Size of Decomposition).

There exists a family of formulas in such that grows linearly in , while the smallest decomposition on in DNF/CNF is exponential in .

Proof.

Consider the formulas . Using a binary encoding of constants, the size of the formulas is linear in . We show that decompositions in DNF/CNF must be exponential in size.

Disjunctive: Suppose is a monadic decomposition in DNF. Each disjunct , if it is satisfiable at all, has an upper right corner  such that holds, but . This immediately implies that exponentially many disjuncts are needed to cover the exponentially many points on the line .

Conjunctive: Suppose is a succinct monadic decomposition of in CNF. Since , it follows that . Therefore, is a succinct decomposition of in DNF, contradicting the lower bound for DNFs.∎∎

0.3.2 Upper Bound

We prove Theorem 0.1.1. Following Lemma 0.2.5, it suffices to show that testing decomposability on a variable is in coNP and that a decomposition can be computed in exponential time. Assume without loss of generality that we have where , and that we are decomposing on the first variable .

We claim that is monadically decomposable on iff

where is a bound exponential in the size of and is a formula asserting that and satisfy the same divisibility constraints. This bound is computable in polynomial time and is described in Section 0.3.4. To define , let be the set of all divisibility constraints or appearing (syntactically) in . Assume without loss of generality that always appears on the left-hand side of a divisibility constraint (i.e., in the position of ). We then define

We prove the claim in the following sections and simultaneously show how to construct the decomposition. Once we have established the above, we can test non-decomposability on by checking

which is decidable in NP. Thus we obtain a coNP decision procedure because the above formula is polynomial in the size of .

Example

We consider some examples. First consider the formula that cannot be decomposed on . Since there are no divisibility constraints, is simply . It is straightforward to see that, , for example by setting , , and .

Now consider the monadically decomposable formula

In this case . We can verify

holds, as it will be the case that and for all the formula will hold whenever holds and . The precondition ensures that the if and only if holds. We will construct the decomposition in the next section.

Expanded Divisibility Constraints

Observe that divisibility constraints are always decomposable. In particular, is equivalent to a finite disjunction of clauses where and are bounded by a multiple of and . The expansion is exponential in size, since the values up to have to be enumerated explicitly.

We define be the set of all constraints of the form where and appears directly in or in the expansion of the divisibility constraints of . This set will be used in the next sections.

0.3.3 Soundness

We show that if

then is decomposable on . We do this by constructing the decomposition.

Although there are doubly exponentially many subsets , there are only exponentially many maximal consistent subsets. We implicitly restrict to such subsets. This is because, for any , there is no value of such that and both hold with but . For any maximal consistent set , let be the smallest integer greater than or equal to satisfying all constraints in . Note, since is maximal, a value that satisfies all constraints in also does not satisfy an constraints not in . The number  can be represented using polynomially many bits.

We can now decompose into

This formula is exponential in the size of if only ranges over the maximal consistent subsets of . For values of less than , equivalence with the original formula is immediate. For larger values, we use the fact that, from our original assumption, for any values and that satisfy the same divisibility constraints, we have iff . Hence, we can substitute the values in these cases.

Example

We return to and compute the decomposition on . Assuming

is odd, the decomposition will be as follows. In our presentation we slightly simplify the formula. Strictly speaking

should be expanded to . We simplify these to and , respectively, when instantiated with concrete values of .

0.3.4 Completeness

We now show that every formula decomposable on satisfies

We first show that some must exist. Once the existence has been established, we can argue that it must be at most exponential in .

Existence of the Bound

If is decomposable on , then there is an equivalent formula . It is known that every formula is satisfied by a finite union of arithmetic progressions . Let be larger than the largest value of in the arithmetic progressions satisfying the .

We show when then iff for all values and . Assume towards a contradiction that we have values and a tuple of values such that and , but not .

Let be the product of all appearing in some divisibility constraint in . We know that there is some disjunct of the monadic decomposition such that holds. Moreover, let belong to the arithmetic progression . Since we know that also holds for any . That is, we can pump by adding a multiple of , while staying in the same arithmetic progression and satisfying the same divisibility constraints.

Similarly, let be the product of all appearing in the (finite number of) arithmetic progressions that define the monadic decomposition of , limited to disjuncts such that holds. Since does not hold, then also does not hold for any . This means that we can pump staying outside of the arithmetic progressions defining permissible values of for the given values , whilst additionally satisfying the same divisibility constraints.

Now, for each value of satisfying we can consider the disjunctive normal form of . By expanding the divisibility constraints, a disjunct becomes a conjunction of terms of the form, where represents some linear function on ,

  1. or , or

  2. or .

Since there are infinitely many , we can choose one disjunct satisfied by infinitely many . This means that for constraints of the form or with a non-zero , then must be negative or positive respectively (or zero). Otherwise, only a finite number of values of would be permitted.

We know that and do not satisfy the disjunct. We argue that this is a contradiction by considering each term in turn. Since there are infinitely many we can assume without loss of generality that .

  1. If (resp. ) appears and is satisfied by , then must be negative or zero (resp. positive or zero) and will also satisfy the atom.

  2. Atoms of the form do not distinguish values of and thus are satisfied for both and . We cannot have but not since and satisfy the same divisibility constraints.

Thus, it cannot be the case that satisfies the disjunct, while does not. This is our required contradiction. Hence, for all and such that it must be the case that iff . We have thus established the existence of a bound .

Size of the Bound

We now argue that this bound is exponential in the size of , and can thus be encoded in a polynomial number of bits.

Consider the formula that is essentially the negation of our property.

There is some computable bound exponential in the size of (and thus ) such that, if there exists and some such that holds, then there are infinitely many and such that for some we have that holds. An argument for the existence of this bound is given in Appendix .7. In short, we first convert the formula above into a disjunction of conjunctions of linear equalities, using a linear number of slack variables to encode inequalities and divisibility constraints. Then, using a result of Chistikov and Haase [11], we set where is the number of bits needed to encode the largest constant in the converted formula (polynomially related to the size of the formula above), is the maximum number of linear equalities in any disjunct, and is the number of variables (including slack variables).

Now, assume that the smallest is larger than . That is

holds, but it does not hold that

This implies there exists some and such that holds. Thus, there are infinitely many such and , contradicting the fact that all do not satisfy the property. Thus, we take as the value of . It is computable in polynomial time, exponential in size, and representable in a polynomial number of bits.

0.4 Variadic Decomposability

We consider decomposition along several variables instead of just one. In this section, we assume without loss of generality that is given in positive normal form and all (in)equalities rearranged into the form . We may use negation as a shorthand. We require this form because later we use the set of all linear equations in the DNF of a formula. Since negation alters the linear equations, it is more convenient to assume that negation has already been eliminated.

0.4.1 -Decomposability

As described in Section 0.2.3, we refine the notion of -decomposability to separate only a single set in . Without loss of generality, we assume we are given a formula and we separate the variables in from .

In particular, given a formula we aim to decompose the formula into for some formulas and .

0.4.2 Decomposition

We show that testing whether a given formula is variadic decomposable on is in coNP. This proves Theorem 0.1.2 as the coNP lower bound follows from the monadic case.

Lemma 0.4.1 (Decomposing on ).

Given a formula there is a coNP algorithm to decide if is variadic decomposable on . Moreover, if a decomposition exists, it can be constructed in exponential-time and is exponential in size.

Let be the set of all such that is a linear inequality appearing in . Our approach will divide the points of into regions where all points within a region can be paired with the same values of to satisfy the formula. These regions are given by a bound . If is within the bound, then two points and are in the same region if . If two points are outside the bound, then by a pumping argument we can show that we have iff .

Let be a partition of into unbounded and bounded functions (where refers to equality being asserted over bounded functions as shown below). Define for each

Note, this formula intentionally does not say anything about the unbounded functions. This is important when we need to derive a bound—such a derivation cannot use a pre-existing bound.

We also need to extend to account for and

being vectors. This is a straightforward extension asserting that each variable in

satisfies the same divisibility constraints as its counterpart in . Again, let be the set of all divisibility constraints appearing (syntactically) in . Let , and denote the th variable of , , and respectively. Assume without loss of generality that variables in always either appear on the left-hand side of a divisibility constraint (i.e. in the position) or on both sides. Define

Next, we introduce an operator for comparing a vector of variables with a bound. For let denote the absolute value of . Given a bound and some let

We claim there is an exponential bound such that is variadic decomposable iff for all we have

(DC-)

Note, unsatisfiability can be tested in NP. First guess , then guess .

We prove soundness of the claim in the next section. Completeness is an extension of the argument for the monadic case and is given in Appendix .8. In the monadic case, we were able to take some values of such that both satisfied the same divisibility constraints, but one value satisfied the formula while the other did not. Since these values were large, we derived an infinite number of such value pairs with increasing values. We then used these growing solutions to show that it was impossible for the value of to satisfy the formula, while the value of does not, as they were both beyond the distinguishing power of the linear inequalities. The argument for the variadic case is similar, with the values of and being replaced by the values of and .

0.4.3 Soundness

Assume there is an exponential bound such that for each , Equation DC- holds. We show how to produce a decomposition.

As in the monadic case (Section 0.3.3), let be the set of all constraints of the form in the expansion of the divisibility constraints of . Observe again that there are only exponentially many maximal consistent subsets . For each fix a vector of values that satisfies all constraints in and is encodable in a polynomial number of bits. Furthermore, we define

For each and we can define an equivalence relation over values of such that and .

Observe each equivalence relation has an exponential number of equivalence classes depending on the values of the bounded . Let be a set of minimal representatives from each equivalence class such that each representative is representable in a polynomial number of bits. These can be computed by solving an existential Presburger constraint for each set of values of the bounded . In particular, for each and assignments for each , we select a solution to the equation