In this article we study the computational complexity of inclusion logic. Inclusion logic was introduced by Galliani [galliani12] as a variant of dependence logic, developed by Väänänen in 2007 [vaananen07]. Dependence logic is a logical formalism that extends first-order logic with novel atomic formulae expressing that a variable depends on variables . One motivation behind dependence logic is to find a unifying logical framework for analyzing dependency notions from different contexts. Since its introduction, versions of dependence logic have been formulated and investigated in a variety of logical environments, including propositional logic [HannulaKVV18, Virtema17, YangV16], modal logic [ebbing13, vaananen08b], probabilistic logics [Durand2018], and two-variable logics [KontinenKLV14]. Recent research has also pursued connections and applications of dependence logic to fields such as database theory [HannulaK16, HannulaKV18]CoranderHKPV16], and social choice theory [Pacuit2016]. A common notion underlying all these endeavours is that of team semantics. Team semantics, introduced by Hodges in [hodges97]
, is a semantical framework where formulae are evaluated over multitudes instead of singletons of objects as in classical logics. Depending on the application domain these multitudes may then refer to assignment sets, probability distributions, or database tables, each having their characteristic versions of team semantics[vaananen07, Durand2018, HannulaKV18].
After the introduction of dependence logic Grädel and Väänänen observed that team semantics can be also used to create logics for independence [gradel12]. This was followed by [galliani12] in which Galliani investigated logical languages built upon multiple different dependency notions. Inspired by the inclusion dependencies of database theory, one of the logics introduced was inclusion logic that extends first-order logic with inclusion atoms. Given two sequences of variables and having same length, an inclusion atom expresses that the set of values of is included in the set of values of . Inclusion logic was shown to be equi-expressive to positive greatest-fixed point logic in [gallhella13]. In contrast to dependence logic which is equivalent to existential second-order logic [vaananen07], and thus to non-deterministic polynomial time (), this finding established inclusion logic as the first team-based based logic for polynomial time (). Our focus in this article is to pursue this connection further by investigating the complexity of quantifier-free inclusion logic in terms of two computational problems: maximal subteam membership and model checking problems. In particular, we identify complexity thresholds for these problems in terms of first-order definability, (non-deterministic) logarithmic space, and polynomial time.
The maximal subteam membership problem for a formula asks whether a given assignment is in the maximal subteam of a given team that satisfies . This problem is closely related to the notion of a repair of an inconsistent database [ArenasBC99]. A repair of a database instance w.r.t. some set of constraints is an instance obtained by deleting and/or adding tuples from/to such that satisfies , and the difference between and is minimal according to some measure. If only deletion of tuples is allowed, is called a subset repair. It was observed in [ChomickiM05] that if consists of inclusion dependencies, then for every there exists a unique subset repair of ; this was later generalized to arbitrary LAV tgds (local-as-view tuple generating dependencies) in [CateFK15].
The research on database repair has been mainly focused on two problems: consistent query answering and repair checking. In the former, given a query and a database instance the problem is to compute the set of tuples that belong to for every repair of . The latter is the decision problem: is a repair of for two given database instances and . The complexity of these problems for various classes of dependencies and different types of repairs has been extensively studied in the literature; see e.g. [AfratiK09, ChomickiM05, KoutrisW17, CateFK15]. In this setting, the maximal subteam membership problem can be seen as a variant of the repair checking problem: regarding a team as a (unirelational) database instance and a formula of inclusion logic as a constraint, an assignment is a positive instance of just in case it is in the unique subset repair of . Note however, that in , the task is essentially to compute the maximal subteam from a given database instance , instead of just checking that a given is the unique subset repair of . Note further, that using a single formula as a constraint is actually more general than using a (finite) set of inclusion dependencies. Indeed, as we can take the conjunction of all inclusions in . Furthermore, using disjunctions and quantifiers, we can form constraints not expressible in the usual formalism with a set of dependencies.
The complexity of model checking in team semantics has been studied in [DurandKRV15, kontinenj13] for dependence and independence logics. For these logics increase in complexity arises particularly from disjunctions. For example, model checking for a disjunction of three (two, resp.) dependence atoms is complete for (, resp.), while a single dependence atom is first-order definable [kontinenj13]. The results of this paper, in contrast, demonstrate that the complexity of inclusion logic formulae is particularly sensitive to conjunctions. We show that is complete for non-deterministic logarithmic space if is of the form or ; for any other conjunction of (non-trivial) unary inclusion atoms is complete for polynomial time. This result gives a complete characterization of the maximal subteam membership problem for conjunctions of unary inclusion atoms. Based on it we also prove complexity results for model checking of quantifier-free inclusion logic formulae. For instance, for any non-trivial quantifier-free in which do not occur, model checking of is -hard, while that of is -complete.
We also present a safety game for the maximal subteam membership problem. Using this game we examine instances of the maximal subteam membership problem in which the inclusion atoms refer to a key, that is, all inclusion atoms are of the form where is a variable which uniquely determines all the remaining variables. We give example formulae for which the thresholds between and drop down to and under these assumptions.
We conclude the paper by presenting a fragment of inclusion logic that captures . Analogous fragments have previously been established at least for dependence logic. By relating to the Horn fragment of existential second-order logic, Ebbing et al. define a fragment of dependence logic that corresponds to [ebbing14]. The fragment presented in this paper is constructed by restricting occurrences of inclusion atoms and universal quantifiers, and the correspondence with is shown by using the well-known characterization of in terms of transitive closure logic [Immerman83, Immerman88].
We generally use for variables and for elements of models. If and are two tuples, we write for the concatenation of and .
Throughout the paper we assume that the reader has a basic familiarity of computational complexity. We use the notation , , and for the classes consisting of all problems computable in logarithmic space, non-deterministic logarithmic space, polynomial time and non-deterministic polynomial time, respectively.
2.1 Team Semantics
As is customary for logics in the team semantics setting, we assume that all formulae are in negation normal form (NNF). Thus, we give the syntax of first-order logic () as follows:
where and are terms and is a relation symbol of the underlying vocabulary. For a first-order formula , we denote by the set of free variables of , defined in the usual way. The team semantics of is given in terms of the notion of a team. Let be a model with domain . An assignment of is a function from a finite set of variables into . We write for the assignment that maps all variables according to , except that it maps to . For an assignment , we may use a shorthand if the underlying ordering of the domain is clear from the context. A team of with domain is a set of assignments from into . For , the restriction of a team is defined as . If is a team, , and , then denotes the team . For a set , is the team . Also, if is an assignment, then by we refer to Tarski semantics.
For a model , a team and a formula in , the satisfaction relation is defined as follows:
if , when is a literal,
if for some such that ,
if for some ,
If , then we say that and satisfy . If does not contain any symbols from the underlying vocabulary, in which case satisfaction of a formula does not depend on the model , we say that satisfies , written , if for all models with a suitable domain (i.e., a domain that includes all the elements appearing in ). If is a sentence, that is, a formula without any free variables, then we say that satisfies , and write , if , where is the team that consists of the empty assignment .
We say that two sentences and are equivalent, written , if for all models . For two logics and , we write if every -sentence is equivalent to some -sentence; the relations “” and “” for and are defined in terms of “” in the standard way.
Satisfaction of a first-order formula reduces to Tarski semantics in the following way.
Proposition 2 (Flatness [vaananen07]).
For all models , teams , and formulae ,
A straightforward consequence is that first-order logic is downwards closed.
Corollary 3 (Downward Closure).
For all models , teams , and formulae ,
2.2 Inclusion Logic
Inclusion logic () is defined as the extension of by inclusion atoms.
Inclusion atom. Let and be two tuples of variables of the same length. Then is an inclusion atom with the satisfaction relation:
Inclusion logic is local, meaning that satisfaction of a formula depends only on its free variables. Furthermore, the expressive power of inclusion logic is restricted by its union closure property which states that satisfaction of a formula is preserved under taking arbitrary unions of teams.
Proposition 4 (Locality [galliani12]).
Let be a model, a team, a formula, and a set of variables such that . Then
Proposition 5 (Union Closure [galliani12]).
Let be a model, a set of teams, and a formula. Then
Note that union closure implies the empty team property, that is, for all inclusion logic formulae .
The starting point for our investigations is the result by Galliani and Hella [gallhella13] characterizing the expressivity of inclusion logic in terms of positive greatest fixed point logic. The latter logic is obtained from greatest fixed-point logic (the dual of least fixed point logic) by restricting to formulae in which fixed point operators occur only positively, that is, within a scope of an even number of negations. In finite models this positive fragment captures the full fixed point logic (with both least and greatest fixed points), and hence it follows from the famous result of Immerman [immerman86] and Vardi [vardi82] that inclusion logic captures polynomial time in finite ordered models.
Theorem 6 ([gallhella13]).
Every inclusion logic sentence is equivalent to some positive greatest fixed point logic sentence, and vice versa.
Theorem 7 ([gallhella13]).
A class of finite ordered models is in iff it can be defined in .
2.3 Transitive Closure Logic
In Section 6 we will explore connections between inclusion logic and transitive closure logic, and hence we next give a short introduction to the latter. A -ary relation is said to be transitive if and imply for -tuples . The transitive closure of a -ary relation , written , is defined as the intersection of all -ary relations that are transitive. The transitive closure of can be alternatively defined as for defined recursively as follows: and for ; here denotes the composition of two relations and . Note that iff there is an -path of length from to .
An assignment , a model , and a formula , where and are -ary, give rise to a -ary relation defined as follows:
We can now define transitive closure logic. Given a term , a model , and an assignment , we write for the interpretation of under , defined in the usual way.
Definition 8 (Transitive Closure Logic).
Transitive closure logic () is obtained by extending first-order logic with transitive closure formulae where and are -tuples of terms, and is a formula where and are -tuples of variables. The semantics of the transitive closure formula is defined as follows:
Thus, is true if and only if there is a -path from to . It is well known that transitive closure logic captures non-deterministic logarithmic space in finite ordered models. In particular, this can be achieved by using only one application of the operator. We use below the notation for the least element of the linear order, and for the tuple . Similarly, denotes the tuple , where is the greatest element.
Theorem 9 ([Immerman83, Immerman88]).
A class of finite ordered models is in iff it can be defined in . Furthermore, every -sentence is equivalent in finite ordered models to a sentence of the form
where is first-order.
3 Maximal Subteam Membership
In this section we define the maximal subteam membership problem and discuss some of its basic properties. We also define a safety game for quantifier-free inclusion logic formulae. This game will be used later to facilitate some proofs regarding the complexity of the maximal subteam membership problem.
For a model , a team , and an inclusion logic formula , we define as the unique subteam such that , and if . Due to the union closure property always exists and it can be alternatively defined as the union of all subteams such that . If does not contain any symbols from the underlying vocabulary, then we may write instead of . The maximal subteam membership problem is now given as follows.
Let . Then is the problem of determining whether for a given model , a team and an assignment .
Grädel proved that for any -formula , there is a formula of positive greatest fixed point logic such that for any model and assignment , if and only if is in the maximal team of satisfying (see Theorem 24 in [Gradel16]). An easy adaptation of the proof shows that is also definable in positive greatest fixed point logic. Thus, it follows that every maximal subteam membership problem is polynomial time computable.
For every formula , is in .
In this section we will restrict our attention to maximal subteam problems for quantifier free formulae. Before proceeding to our findings we need to present some auxiliary concepts and results. The following lemmata will be useful below.
Let , and let be a team of a model . Then .
For “”, note that by definition there are such that , and , and hence and . For “”, note that satisifes so it must be contained by . ∎
As an easy corollary we obtain the following lemma.
Let , and assume that and both belong to a complexity class . Then is in .
The maximal subteam problem for a single inclusion atom can be naturally represented using directed graphs. In this representation each assignment forms a vertex, and an assignment has an outgoing edge to another assignment if . Over finite teams an assignment then belongs to the maximal subteam for if and only if it is connected to a cycle.111We are grateful to Phokion Kolaitis, who pointed out this fact to the second author in a private discussion 2016.
Let be a model, a finite team, and two tuples of the same length from , an assignment of , and a first-order formula. Let be a directed graph where iff and . Then
Assume for the first statement that . Then there is a subteam such that and . Thus for each there exists such that . Moreover, , whence . In particular there is a non-ending path in starting from . Since is finite, this path necessarily ends in a cycle. Conversely, assume contains a path from to a cycle. Then where consists of all assignments in the path and cycle. Hence, .
For second statement note that, by the argument above, if and only if contains a path from to a cycle. But clearly an -path from to an -cycle is an -path from an -cycle to . ∎
3.2 Safety Game
In this section we present a version of a safety game for the maximal subteam problem of inclusion logic. Our presentation is also related to the safety games for inclusion logic examined in [Gradel16]. We present a safety game for a quadruple , written , where is an assignment of a team , and is a quantifier-free formula. The main result of the section shows that the maximal subteam problem over and can be characterized in terms of this game.
We assume that the reader is familiar with basic terminology on trees. We associate each quantifier-free with a labeled rooted tree such that the root of the tree is labeled by and each node labeled by or has two children labeled by and . Notice that two different nodes may have the same label. The safety game for can now be interpreted as a pebble game where assignments of a team form a stack of pebbles of which one at a time is placed on a node of . Legal moves of the game then consist of moving the pebble up and down through the tree, removing the pebble from a leaf, and placing a new pebble on a leaf. The starting position is to have placed on the root, and the winning condition for Player I is to arrive at a position where the game terminates. If no such position is ever reached, Player II wins.
Definition 15 (Safety Game).
Let be quantifier-free, and let be an assignment in a team of a model . The safety game has two players I and II, and the game moves consist of positions and where and is a node of . The game starts with the position , where is the root, and given a position , the game proceeds as follows:
If is labeled by a conjunction, then Player I selects a position where is a child of .
If is labeled by a disjunction, then Player II selects a position where is a child of .
If is labeled by a literal , then the game ends if . Otherwise, Player I selects a position such that is a descendant of .
If is labeled by , then the game ends if there is no such that . Otherwise, Player I either
selects a position such that is a descendant of , or
selects the position .
Given a position , the game proceeds as follows:
Player II selects a position such that .
Player I wins if the game ends after a finite number of moves by the players. Otherwise, Player II wins.
A strategy for a Player is a mapping on positions such that
, for a non-leaf ,
, for a leaf labeled by a literal,
, for a leaf labeled by .
, for a leaf labeled by .
Player has a winning strategy for if there is a strategy such that wins every game that she plays according to . That is, wins any game where she selects the position on her moves on .
Note that if does not contain any symbols from the underlying vocabulary, the outcome of is independent of , and thus we write instead.
Next we show that the safety game above gives rise to a characterization of the maximal subteam problem.
Let be quantifier-free, and let be an assignment in a team of a model . Then iff Player II has a winning strategy in .
For the “only-if” direction, we define top-down recursively for each node a team such that
for the root ,
, for a child of a node where is labeled by .
It follows that for with label ; for conjunction-labeled with children ; and for disjunction-labeled with children . The strategy of Player II is now the following. If is labeled by disjunction, then is mapped to some where is a child of such that , and if is labeled by , then is mapped to some such that and . We leave it to the reader to check that this strategy is well-defined and winning.
For the “if” direction, assume Player II has a winning strategy . For a node of , we let be the set of all assignments for which there exists a game where Player II plays according to her winning strategy and position is played at some point of the game. Consider any assignment from for a node labeled by . This means there is a game where position is played, and thus also a game where , and furthermore is played. Consequently, an assignment exists such that . By analogous reasoning we obtain that for all other types of nodes with label , too. Furthermore, for conjunction-labeled with children , and for disjunction-labeled with children . In particular, and , and hence . ∎
Given that is finite, it makes sense to consider bounded length restrictions of the safety game. We let denote the version of in which, starting position excluded, positions of the form , i.e., pairs whose left element is an assignment and right element a node, are played at most times. Player I wins if the game terminates before such assignment-node pairs have been played times. Otherwise, if exactly plays of such nodes appear, Player II wins. The next lemma will be useful later.
Let be quantifier-free and such that has nodes, and let be an assignment of a team that is of size . Then Player II has a winning strategy for iff she has a winning strategy for .
By the end of , positions of the form , the root position included, have occurred many times, i.e., some position has occurred twice. Every time such a repetition is encountered, we may assume that we continue the game from the first occurrence of . Since the strategy of Player II is safe for assignment-node moves, we conclude that never terminates. Hence, Player II wins. ∎
4 Complexity of Maximal Subteam Membership
Next we examine the computational complexity of maximal subteam membership. First in Section 4.1 we investigate this problem over arbitrary teams, and then in Section 4.1 we restrict attention to inputs in which the inclusion atoms refer to a key. In Section 4.3 we discuss the implications of our results to consistent query answering.
4.1 Arbitrary Teams
We give a complete characterization of the maximal subteam problem for arbitrary conjunctions of unary inclusion atoms. A unary inclusion atom is an atom of the form where and are single variables. The characterization is given in terms of inclusion graphs.
Let be a set of unary inclusion atoms over variables in . Then the inclusion graph of is defined as such that iff and appears in .
We will now prove the following theorem.
Let be a finite set of unary inclusion atoms, and let be the conjunction of all atoms in . Then is
trivially true if has no edges,
-complete if has an edge and no other edges except possibly for its inverse ,
The first statement above follows from the observation that is true for all inputs if is a conjunction of trivial inclusion atoms . The second statement is shown by relating to graph reachability. Given a directed graph and two vertices and , the problem REACH is to determine whether contains a path from to . This problem is a well-known complete problem for , and it will also be applied later in Section 4.2 where the complexity of over teams with keys and is examined.
and are -complete.
Hardness. We give a logarithmic space many-one reduction from REACH. Let be a graph, and let . W.l.o.g. we can assume has no cycles. Define as the extension of with an extra edge . Then is reachable from in if and only if belongs to a cycle in . We reduce from to a team where maps to (see Fig. 1). By Lemma 14, is reachable from if and only if , where is either or .
Membership. By Lemma 14 and reduce to reachability variants that are clearly in . ∎
Next we turn to the third statement of Theorem 19. Recall that membership in follows directly from Lemma 11. For -hardness we reduce from the monotone circuit value problem (see, e.g., [Vollmerbook]). The proof essentially follows from the following lemma.
, , and are -complete.
Let be either , , or . We give a logarithmic-space many-one reduction to from the monotone circuit value problem (MCVP). Given a Boolean word , and a Boolean circuit with inputs, one output, and gates with labels from , this problem is to determine whether outputs . If outputs on , we say that it accepts . W.l.o.g. we may assume that the in-degree of each AND and OR gate is . We annotate each input node by its corresponding input or , and each gate by some distinct number . Then each gate has two child nodes that are either natural numbers or input values from . Next we construct a team whose values consist of node annotations and distinct copies of AND gates . The team is constructed by the following rules (see Fig. 2):
add where is the output gate,
for AND gates , add , , and ,
for OR gates , add and .
We will show that accepts iff . For the only-if direction we actually show a slightly stronger claim. That is, we show that is implied even if is the conjunction of all unary inclusion atoms between .
Assume first that accepts . We show how to build a subteam such that it includes and satisfies all unary inclusion atoms between . First construct a subcircuit of recursively as follows: add the output gate to ; for each added AND gate , add both child nodes of ; for each added OR gate , add a child node of that is evaluated true under . In other words, is a set of paths from the output gate to the input gates that witnesses the assumption that accepts . The team will now list the auxiliary values and the gates of in each column . We construct by the following rules:
for AND gates in , add , , and ,
for OR gates in , add iff is in , for .
It follows from the construction that satisfies all unary inclusion atoms between .
Vice versa, consider the standard semantic game between Player I and Player II on the given circuit and input word . This game starts from the output gate , and at each AND (OR, resp.) gate Player I (Player II, resp.) selects the next node from its two child nodes and . Player II wins iff the game ends at an input node that is true. By the assumption that we find a team that contains and satisfies . Note that cannot contain any assignment that maps to . For showing that accepts it thus suffices to show that Player II has a strategy which imposes the following restriction: for each visited node annotated by , we have for some . At start this holds by the assumption that . Assume that is any gate with such that . If is , we have two cases. If is an OR gate then we find from with . Then the strategy of Player II is to select the gate as her next step. If is an AND gate, an application of gives from with . Then , which means that further application of yields from with and hence . Now , and thus the claim holds for either selection by Player I. The induction step is analogous for the cases where is or . This concludes the proof. ∎
The third statement of Theorem 19 now follows. Any not covered by the previous lemma has a subgraph of a form depicted in Fig. 3. Of these – were considered above, and the reduction for is essentially identical to that for ; take a new variable for the new target node and insert values identical to those of . Additionally, for each node in but not in take a copy of any column in the team. That this suffices follows from the arguments of the previous lemmata; in particular, from the fact that any true MCVP instance generates a subteam that satisfies all possible unary inclusion atoms between variables.
Considering disjunctions, observe that over a disjunction of unary inclusion atoms is either trivially true or -complete. For membership in , see Lemma 13. For -hardness one may use the reduction of Lemma 20 for . Provided that another non-trivial inclusion atom appears in the disjunction, then and the values for can be defined in such a way that the maximal subteam for is empty.
Let be a finite set of unary inclusion atoms, and let be the disjunction of all atoms in . Then is
trivially true if contains a trivial inclusion atom,
Note that the results of this section generalize to inclusion atoms of higher arity, obtained by replacing variables with tuples such that all pairs of distinct tuples have no common variables. More complex cases arise if the tuples are allowed to overlap.
4.2 Teams with a Key
In relational database management inclusion atoms usually appear in form of a foreign key that has the purpose of securing referential integrity upon databases. In this section we investigate the maximal subteam problems in a framework where inclusion atoms correspond to uni-relational foreign keys. That is, we consider inclusion atoms over teams on which the variable is a key. Given this additional restriction, we observe that the maximal subteam and model checking problems both collapse to lower computational levels. The complexity of the maximal subteam problem for collapses from to , and for it collapses from to .
Let be a team over a set of variables , and let be a subset of . Then is a key on if for all .
First we show that maximal subteam for single inclusion atom is -complete if the inclusion atom refers to a key.
over teams for which is a key is -complete.
Hardness. We show a first-order many-one reduction from a deterministic variant of the reachability problem. Given a directed graph and its two vertices and , this problem is to determine whether there is a deterministic path from to . A deterministic path is such that for every edge in the path there is no other edge in going out of . Similarly to above, and with respect to first-order reductions, w.l.o.g. we may assume that the out-degrees of and are and has no cycles except for a self-loop on . An instance of deterministic rechability is reduced to a team over by applying the formation rule:
For each edge that is the only edge going out from , add the assignment to .
For instance, the instance illustrated in Fig. 1 reduces to the team that consists of the three assignments . We show that the input instance admits positive answer iff , where is the assignment that corresponds to the only edge going out from .
By Lemma 14, iff is connected to a cycle in the graph , where iff . Since the only cycle in is the self-loop on