Expressivity within second-order transitive-closure logic

04/16/2018 ∙ by Flavio Ferrarotti, et al. ∙ Software Competence Center Hagenberg UHasselt 0

Second-order transitive-closure logic, SO(TC), is an expressive declarative language that captures the complexity class PSPACE. Already its monadic fragment, MSO(TC), allows the expression of various NP-hard and even PSPACE-hard problems in a natural and elegant manner. As SO(TC) offers an attractive framework for expressing properties in terms of declaratively specified computations, it is interesting to understand the expressivity of different features of the language. This paper focuses on the fragment MSO(TC), as well on the purely existential fragment SO(2TC)(E); in 2TC, the TC operator binds only tuples of relation variables. We establish that, with respect to expressive power, SO(2TC)(E) collapses to existential first-order logic. In addition we study the relationship of MSO(TC) to an extension of MSO(TC) with counting features (CMSO(TC)) as well as to order-invariant MSO. We show that the expressive powers of CMSO(TC) and MSO(TC) coincide. Moreover we establish that, over unary vocabularies, MSO(TC) strictly subsumes order-invariant MSO.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Second-order transitive-closure logic, , is an expressive declarative language that captures the complexity class  [21]. It extends second-order logic with a transitive closure operator over relations of relations, i.e., over super relations among relational structures. The super relations are defined by means of second-order logic formulae with free relation variables. Already its monadic fragment, , allows the expression of -complete problems in a natural and elegant manner. Consider, for instance, the well known Hamiltonian cycle query over the standard vocabulary of graphs, which is not expressible in monadic second-order logic [13]. A graph has a Hamiltonian cycle if the following holds:

  1. There is a relation such that iff , , and .

  2. The tuple is in the transitive closure of , for some s.t. .

In the language of this can be written as follows:

where . Even some well-known -complete problems such as deciding whether a given quantified Boolean formula QBF is valid [27] can be expressed in MSO(TC) (see Section 3).

In general, SO(TC) offers an attractive framework for expressing properties in terms of declaratively specified computations at a high level of abstraction. There are many examples of graph computation problems that involve complex conditions such as graph colouring [4], topological subgraph discovery [19], recognition of hypercube graphs [18], and many others (see [9, 16, 17]). Such graph algorithms are difficult to specify, even by means of rigorous methods such as Abstract State Machines (ASMs) [10], B [2] or Event-B [3], because the algorithms require the definition of characterising conditions for particular subgraphs that lead to expressions beyond first-order logic. Therefore, for the sake of easily comprehensible and at the same time fully formal high-level specifications, it is reasonable to explore languages such as SO(TC). Let us see an example that further supports these observations.

Self-similarity of complex networks [36] (aka scale invariance) has practical applications in diverse areas such as the world-wide web [14], social networks [20], and biological networks [32]. Given a network represented as a finite graph , it is relevant to determine whether can be built starting from some graph pattern by recursively replacing nodes in the pattern by new, “smaller scale”, copies of . If this holds, then we say that is self-similar.

Formally, a graph is self-similar w.r.t. a graph pattern of size , if there is a sequence of graphs such that , and, for every pair of consecutive graphs in the sequence, there is a partition of the set of nodes of which satisfies the following:

  1. For every , the sub-graph induced by in is isomorphic to .

  2. There is a graph isomorphic to with set of nodes for some and set of edges

  3. For very , the closed neighborhoods and of and in , respectively, are isomorphic.

It is straightforward to write this definition of self-similarity in SO(TC), for we can clearly write a second-order logic formula which defines such a super relation on graphs and then simply check whether the pair of graphs is in the transitive closure of .

Highly expressive query languages are gaining relevance in areas such as knowledge representation (KR), rigorous methods and provers. There are several examples of highly expressive query languages related to applications in KR. See for instance the monadically defined queries in [35], the Monadic Disjunctive SNP queries in [5] or the guarded queries in [11]. All of them can be considered fragments of Datalog. Regarding rigorous methods, the TLA language [28] is able to deal with higher-order formulations, and tools such as the TLA Proof System111 and the TLA Model-Checker (TLC)222 can handle them (provided a finite universe of values for TLC). Provers such as Coq333 and Isabelle444 can already handle some high-order expression. Moreover, the success with solvers for the Boolean satisfiability problem (SAT) has encouraged researchers to target larger classes of problems, including -complete problems, such as satisfiability of Quantified Boolean formulas (QBF). Note the competitive evaluations of QBF solvers (QBFEVAL) held in 2016 and 2017 and recent publications on QBF solvers such as [8, 31, 22] among several others.

We thus think it is timely to study which features of highly expressive query languages affect their expressive power. In this sense, provides a good theoretical base since, apart from been a highly expressive query language (recall that it captures ), it enables natural and precise high-level definitions of complex practical problems, mainly due to its ability to express properties in terms of declaratively specified computations. While second-order logic extended with the standard partial fixed-point operator, as well as first-order logic closed under taking partial fixed-points and under an operator for non-deterministic choice, also capture the class of queries over arbitrary finite structure [33], relevant computation problems such as that in Example 1 are clearly more difficult to specify in these logics. The same applies to the extension of first-order logic with the partial fixed-point operator, which is furthermore subsumed by since it captures only on the class of ordered finite structures [1]. Note that coupled with hereditary finite sets and set terms, could be considered as a kind of declarative version of Blass, Gurevich, and Shelah (BGS) model of abstract state machine [7], which is a powerful language in which all computable queries to relational databases can be expressed [6].

Our results can be summarized as follows.

  1. We investigate to what extent universal quantification and negation are important to the expressive power of . Specifically, we consider the case where TC-operators are applied only to second-order variables. Of course, a second-order variable can simulate a first-order variable, since we can express already in first-order logic (FO) that a set is a singleton. This, however, requires universal quantification.

    We define a “purely existential” fragment of , , as the fragment without universal quantifiers and in which -operators occur only positively and bind only tuples of relation variables. We show that the expressive power of this fragment collapses to that of existential FO.

    For SO alone, this collapse is rather obvious and was already remarked by Rosen in the introduction of his paper [34]. Our result generalizes this collapse to include TC operators, where it is no longer obvious.

  2. We investigate the expressive power of the monadic fragment, . On strings, this logic is equivalent to the complexity class NLIN. Already on unordered structures, however, we show that can express counting terms and numeric predicates in . In particular, can express queries not expressible in the fixpoint logic FO(LFP). We also discuss the fascinating open question whether the converse holds as well.

  3. We compare the expressive power of to that of order-invariant . Specifically, we show that can express queries not expressible in order-invariant ; over monadic vocabularies, we show that order-invariant MSO is subsumed by . Again, what happens over higher-arity relations is an interesting open question.

This paper is organized as follows. In Section 2 definitions and basic notions related to are given. In Section 3 the complexity of model checking is studied. Section 4 is dedicated to establishing the collapse of to existential first-order logic. Sections 5 and 6 concentrate on the relationships between and the counting extension and order-invariant , respectively. We conclude with a discussion of open questions in Section 7.

2 Preliminaries

We assume that the reader is familiar with finite model theory, see e.g., [15] for a good reference. For a tuple of elements, we denote by the th element of the tuple. We recall from the literature, the syntax and semantics of first-order () and second-order () logic, as well as their extensions with the transitive closure operator (). We assume a sufficient supply of first-order and second-order variables. The natural number , is the arity of the second-order variable . By variable, we mean either a first-order or second-order variable. Variables and have the same sort if either both and are first-order variables, or both are second-order variables of the same arity. Tuples and of variables have the same sort, if the lengths of and are the same and, for each , the sort of is the same as the sort of .

The formulas of are defined by the following grammar:

where and are second-order variables, , are first-order variables, and are disjoint tuples of variables of the same sort, and and are also tuples of variables of that same sort (but not necessarily disjoint). The set of free variables of a formula , denoted by is defined as usual. For the operator, we define

Above in the right side, in order to avoid cumbersome notation, we use , , and to denote the sets of variables occurring in the tuples.

A vocabulary is a finite set of variables. A (finite) structure over a vocabulary is a pair , where is a finite nonempty set called the domain of , and is an interpretation of on . By this we mean that whenever is a first-order variable, then , and whenever is a second-order variable of arity , then . In this article, structures are always finite. We denote also by . For a variable and a suitable value for that variable, denotes the structure over equal to except that is mapped to . We extend the notation also to tuples of variables and values, , in the obvious manner. We say that a vocabulary is appropriate for a formula if .

Let be a structure over and an -formula such that is appropriate for . The satisfaction of by , denoted by , is defined as follows. We only give the cases for second-order quantifiers and transitive closure operator; the remaining cases are defined as usual.

  • For second-order variable : iff , for some .

  • For the case of the -operator, consider a formula of the form and let . Define to be the following set

    and consider the binary relation on defined as follows:

    We set to hold if belongs to the transitive closure of . Recall that, for a binary relation on any set , the transitive closure of is defined by

By we denote the variant of in which the quantification of above is restricted to natural numbers . That is, consists of pairs such that is reachable from by in at most steps. Moreover, by and we denote the syntactic restrictions of and of the form

where , , , are tuples of second-order variables (i.e. without first-order variables). The logic then denotes the extension of second-order logic with -operator. Analogously, by , we denote the extension of first-order logic with applications of such transitive-closure operators that bind only first-order variables.555In the literature is often denoted by .

3 Complexity of MSO(TC)

The descriptive complexity of different logics with the transitive closure operator has been thoroughly studied by Immerman. Let denote the fragment of in which second-order variables are all of arity . [[23, 24]]

  • On finite ordered structures, first-order transitive-closure logic captures nondeterministic logarithmic space .

  • On strings (word structures), captures the complexity class .

See also the discussion in the conclusion section.

By the above theorem, captures nondeterministic linear space over strings. Deciding whether a given quantified Boolean formula is valid (QBF) is a well-known -complete problem [27]. Observe that there are -complete problems already in ; in fact QBF is such a problem. Thus, we can conclude the following. The inclusion in is clear.

Proposition .

Data complexity of is -complete.

We next turn to combined complexity of model checking. By the above proposition, this is at least -hard. However, the straightforward algorithm for model checking clearly has polynomial-space combined complexity. We thus conclude:

Proposition .

Combined complexity of is -complete.

For combined complexity, we can actually sharpen the -hardness; already a very simple fragment of is -complete.

Specifically, we give a reduction from the corridor tiling problem, which is a well-known -complete problem. Instance of the corridor tiling problem is a tuple , where is a positive natural number, , for some , is a finite set of tiles, are horizontal and vertical constraints, and are -tuples of tiles from . A corridor tiling for is a function , for some , such that

  • and ,

  • , for and ,

  • , for and .

The corridor tiling problem is the following -complete decision problem [12]:
Input: An instance of the corridor tiling problem.
Output: Does there exist a corridor tiling for ?

Let monadic denote the fragment of of the form , where is a formula of universal first-order logic.

Combined complexity of model checking for monadic is -complete.


Inclusion to follows from the corresponding result for . In order to prove hardness, we give a reduction from corridor tiling. Let be an instance of the corridor tiling problem and set . Let be a vocabulary, where is a binary second-order variable and are monadic second-order variables. Let denote the structure over such that , is the canonical successor relation on , and, for each , and . Define

where and -tuples of distinct monadic second-order variables not in . We then define We claim that if and only if there exists a corridor tiling for , from which the claim follows. ∎

4 Existential positive SO(2TC) collapses to EFO

Let denote the syntactic fragment of in which existential quantifiers and the -operator occur only positively, that is, in scope of even number of negations. In this section, we show that the expressive power of collapses to that of existential first-order logic . In this section, -operators are applied only to tuples of second-order variables. As already discussed in the introduction, this restriction is vital: the formula expresses reachability in directed graphs, which is not definable even in the full first-order logic.

To facilitate our proofs we start by introducing some helpful terminology.

Let and be tuples of the same length and a set of natural numbers. The difference of the tuples and is defined as follows

The similarity of tuples and is defined as follows

We say that the tuples and are pairwise compatible if the sets and are disjoint. The tuples and are pairwise compatible outside if and are disjoint. The tuples and are pairwise -compatible if and are pairwise compatible and .

Let be vocabularies, a -structure, and a tuple of elements of . The (quantifier-free) -type of in is the set of those quantifier free -formulae such that .

The following lemma establishes that -operators that are applied to -formulas can be equivalently expressed by the finite -operator. Every formula of the form , where and , , , are tuples of second-order variables, is equivalent with the formula , for some .


Let , where is quantifier-free, and let denote the vocabulary of . We will show that for large enough and for all -structures

From here on we consider and fixed; especially, by a constant, we mean a number that is independent of the model ; that is, it may depend on and .

It suffices to show the left-to-right direction as the converse direction holds trivially for all . Assume that . By the semantics of there exists a natural number and tuples of relations on such that , , and


For each , let and let denote the vocabulary of . By the semantics of the existential quantifier, (1) is equivalent to saying that


for some -tuples from . We will prove the following claim.

Claim: There exists an index set and mutually pairwise -compatible sequences in that have a common -type provided that is a large enough constant.

Proof of the claim: Let denote the longest (not necessarily consecutive) subsequence of that have a common -type. Since there are only finitely many -types, can be made as large as needed by making a large enough constant.

We will next show that there exists mutually pairwise -compatible sequences in for some (provided that is large enough). Set . In the construction below we maintain the following properties for :

  • For each and for each tuple and in it holds that .

  • The length of is as long a constant as we want it to be.

For , let be a maximal collection (in length) mutually pairwise -compatible sequences from . If we are done. Otherwise note that, since each is an -tuple, the number of different points that may occur in is . By an inductive argument we may assume that the length of is as large a constant as we want, and thus we may conclude that there exists an index and an element such that there are as many as we want tuples in such that . Set and let be the sequence of exactly those such that . Notice that the length of is as large a constant as we want it to be.

Finally, the case . Note that and is a sequence of -tuples; in fact all tuples in are identical. Thus, if the length of is at least , the first sequences of constitute a mutually pairwise -compatible sequence of length . It is now straightforward but tedious to check how large has to be so that the length of is at least ; thus the claim holds. ∎

Now let , , be mutually pairwise -compatible sequences from with a common -type provided by the Claim. Let be an index such that and are pairwise compatible outside and . It is straightforward to check that such a always exists, for if and are not pairwise compatible outside or , there exists some indices such that , and for each such the value of the related has to be unique as are mutually pairwise -compatible. Now must exist since the length of is while the length of is only .

Consider the models and and recall that We claim that there exists a sequence of relations on such that


and thus that and . From this the claim of the theorem follows for .

It now suffices to show that such a exists. The idea is that looks exactly like with respect to points in and like with respect to points . Formally is defined as follows. For every relation and tuple

  • if is completely included in neither nor then we set ,

  • if is completely included in then we set iff ,

  • if is completely included in then we set iff .

Note that if is completely included in both and then there exists indices such that, for , . The former equality follows, with indices in , since and are pairwise compatible outside and . The latter equality follows since and are pairwise -compatible. Since and have the same -type iff , for all , and thus is well-defined. It is now immediate that (3) holds. ∎

For every formula of vocabulary of the form or , where and , , , are tuples of relation variables, there exists an equivalent formula of vocabulary .


Consider first the formula . Define and let be the number of occurrences of in . The idea behind our translation is that the quantification of can be equivalently replaced by a quantification of an -ary relation of size ; this can be then expressed in by quantifying many -tuples (content of the finite relation).

Let denote the formula obtained from by replacing every occurrence of the relation variable of the form in by the formula . Define where, for each i, is a shorthand for and is the formula obtained from by substituting each occurrence of the relation variable of the form in by It is straightforward to check that is an -formula of vocabulary equivalent with .

Consider then the formula . In order to simplify the presentation, we stipulate that and are of length one, that is, variables and , respectively; the generalisation of the proof for arbitrary tuples of second-order variables is straightforward. By Lemma 4, we obtain such that and are equivalent.

The following formulas are defined via substitution; by we denote the formula obtained from by substituting each occurrence of the symbol by the symbol .

  • and , for ,

  • and , for .

Let denote the following formula of existential second-order logic

It is immediate that and are equivalent. Note that is of the form , where is an -formula. By repetitively applying the first case of this lemma to subformulas of , we eventually obtain an equivalent -formula over as required. ∎

The expressive powers of and coincide.


Every -formula is also an -formula, and thus it suffices to establish the converse direction. The proof proceeds via induction on the combined nesting depth of second-order quantifiers and operators. For every formula of the form or , where , let denote the equivalent -formula obtained from Lemma 4.

Let be an arbitrary -formula with combined nesting depth of second-order quantifiers and operators. Let be the -formula obtained from by simultaneously substituting each of its subformula of the form or , where , by its translation . Clearly and are equivalent, and the combined nesting depth of second-order quantifiers and operators in is . Thus, by induction, the claim follows, as for the formula is already in . ∎

5 MSO(TC) and counting

We define a counting extension of and show that the extension does not add expressive power to the logic. In this way, we demonstrate that quite a bit of queries involving counting can be expressed already in .

5.1 Syntax and semantics of CMSO(TC)

We assume a sufficient supply of counter variables or simply counters, which are a new sort of variables. We use the Greek letters and (with subscripts) to denote counter variables. The notion of a vocabulary is extended so that it may also contain counters. A structure over a vocabulary is defined to be a pair as before, where now also maps the counters in to elements of , where is the cardinality of .

We also assume a sufficient supply of numeric predicates. Intuitively numeric predicates are relations over natural numbers such as the tables of multiplication and addition. Technically, we use an approach similar to generalised quantifiers; a -ary numeric predicate is a class of -tuples of natural numbers. For a numeric predicate , we use as a symbol referring to the predicate. For simplicity, we often call also numeric predicate. Note that when evaluating a -ary numeric predicate on a finite structure , we let the numeric predicate access also the cardinality of the structure in question, and thus consists of -tuples and not -tuples. This convention allows us, for example, to regard the modular sum , where refers to the cardinality of the structure, as a -ary numeric predicate.

We consider only those numeric predicates which can be decided in . Since, on finite ordered structures, first-order transitive closure logic captures , this boils down to being definable in first-order transitive closure logic when the counter variables are interpreted as points in an ordered structure representing an initial segment of natural numbers (see Definition 5.1 and Proposition 5.1 below for precise formulations).

The syntax of extends the syntax of as follows:

  • Let be a formula, a counter, and a first-order variable. Then is also a formula. The set of its free variables is defined to be .

  • If is a formula and a counter then also is a formula with set of free variables .

  • Let , …, be counters and let be a -ary numeric predicate. Then