In the past decade, starting with Durand and Grandjean , the fields of logic in computer science and database theory have seen a large number of contributions that deal with the efficient enumeration of query results. In this scenario, the objective is as follows: given a finite relational structure (i.e., a database) and a logical formula (i.e., a query), after a short preprocessing phase, the query results shall be generated one by one, without repetition, with guarantees on the maximum delay time between the output of two tuples. In this vein, the best that one can hope for is constant delay (i.e., the delay may depend on the size of the query but not on that of the input structure) and linear preprocessing time (i.e., time where is the size of a reasonable representation of the input structure, is the query, and is a number only depending on the query but not on the input structure). Constant delay enumeration has also been adopted as a central concept in factorised databases that gained recent attention [39, 38].
Quite a number of query evaluation problems are known to admit constant delay algorithms preceded by linear or pseudo-linear time preprocessing. This is the case for all first-order queries, provided that they are evaluated over classes of structures of bounded degree [21, 29, 13, 32], low degree , bounded expansion , locally bounded expansion , and on classes that are nowhere dense . Also different data models have been investigated, including tree-like data and document spanners [7, 31, 5]. Recently, also the dynamic setting, where a fixed query has to be evaluated repeatedly against a database that is constantly updated, has received quite some attention [33, 13, 12, 27, 14, 4, 37, 36, 6].
This paper deals with the classical, static setting without database updates. We focus on evaluating conjunctive queries (CQs, i.e., primitive-positive formulas) on arbitrary relational structures.222In this paper, structures will always be finite and relational. In the following, FPT-preprocessing (resp., FPL-preprocessing) means preprocessing that takes time (resp., ), and constant delay means delay , where is a computable function, is the query, and is the size of the input structure.
Bagan et al.  showed that every free-connex acyclic CQ allows constant delay enumeration after FPL-preprocessing. More refined results in this vein are due to Bagan  and Brault-Baron ; see  for a survey and  for a tutorial. Bagan et al.  complemented their result by a conditional lower bound: assuming that Boolean matrix multiplication cannot be accomplished in time , self-join-free acyclic CQs that are not free-connex cannot be enumerated with constant delay and FPL-preprocessing. This demonstrates that even if the evaluation of Boolean queries is easy (as known for all acyclic CQs ), the enumeration of the results of non-Boolean queries might be hard (here, for acyclic CQs that are not free-connex).
Bagan et al.  also introduced the notion of free-connex (fc) treewidth (tw) of a CQ and showed that for every class of CQs of bounded fc-tw, within FPT-preprocessing time, one can build a data structure that allows constant delay enumeration of the query results. This can be viewed as a generalisation, to the non-Boolean case, of the well-known result stating that the model-checking problem for classes of Boolean CQs of bounded treewidth is FPT. Note that for non-Boolean queries—even if they come from a class of bounded fc-tw—the size of the query result may be , i.e., far too large to be computed entirely within FPT-preprocessing time; and generalising the known tractability result for Boolean CQs to the non-Boolean case is far from trivial.
In a series of papers, the FPT-result for Boolean CQs has been strengthened to more and more general width-measures, namely to classes of queries of bounded generalised hypertree width (ghw) , bounded fractional hypertree width (fhw) , and bounded submodular width (subw) . The result on bounded fhw has been generalised to the non-Boolean case in the context of factorised databases , which implies constant delay enumeration after FPT-preprocessing for CQs of bounded free-connex fractional hypertree width (fc-fhw). Related data structures that allow constant delay enumeration after FPT-preprocessing for (quantifier-free) CQs of bounded (fc-)fhw have also been provided in [19, 28].
An analogous generalisation of the result on bounded submodular width, however, is still missing. The present paper’s main result closes this gap: we show that on classes of CQs of bounded fc-subw, within FPT-preprocessing time one can build a data structure that allows constant delay enumeration of the query results. And within the same FPT-preprocessing time, one can also construct a data structure that enables to test in constant time whether an input tuple belongs to the query result. Our proof uses Marx’s splitting routine  to decompose the query result of on into the union of results of several queries on several structures but we have to tackle the additional technical difficulty to ensure that the results of all the on can be enumerated efficiently. Once having achieved this, we can conclude by using an elegant trick provided by Durand and Strozecki  for enumerating, without repetition, the union of query results.
As an immediate consequence of the lower bound provided by Marx  in the context of Boolean CQs of unbounded submodular width, one obtains that our main result is tight for certain classes of CQs, namely, recursively enumerable classes of quantifier-free and self-join-free CQs: assuming the exponential time hypothesis (ETH), such a class allows constant delay enumeration after FPT-preprocessing if, and only if, has bounded fc-subw.
Let us mention a related recent result which, however, is incomparable to ours. Abo Khamis et al.  designed an algorithm for evaluating a quantifier-free CQ of submodular width within time ; and an analogous result is also achieved for non-quantifier-free CQs of fc-subw . Here, is the size of the input structure, is the number of tuples in the query result, and is at least exponential in number of variables of . In particular, the algorithm does not distinguish between a preprocessing phase and an enumeration phase and does not provide a guarantee on the delay.
Outline. The rest of the paper is structured as follows. Section 2 provides basic notations concerning structures, queries, and constant delay enumeration. Section 3 recalls concepts of (free-connex) decompositions of queries, provides a precise statement of our main result, and collects the necessary tools for obtaining this result. Section 4 is devoted to the detailed proof of our main result. We conclude in Section 5.
In this section we fix notation and summarise basic definitions.
Basic notation. We write and for the set of non-negative integers and reals, respectively, and we let and for all . By we denote the power set of a set . Whenever denotes a graph, we write and for the set of nodes and the set of edges, respectively, of . Whenever writing to denote a -tuple (for some arity ), we write to denote the tuple’s -th component; i.e., . For a -tuple and indices we let . For a set of -tuples we let .
If and are mappings with domains and , respectively, we say that and are joinable if holds for all . In case that and are joinable, we write to denote the mapping with domain where for all and for all . If and are sets of mappings with domains and , respectively, then .
We use the following further notation where is a set of mappings with domain and . For a set , the projection is the restriction of to ; and . For objects where , we write for the extension of to domain with and for all .
Signatures and structures. A signature is a finite set of relation symbols, where each is equipped with a fixed arity . A -structure consists of a finite set (called the universe or domain of ) and an -ary relation for each . The size of a signature is . We write to denote the cardinality of ’s universe, we write to denote the number of tuples in ’s largest relation, and we write or to denote the size of a reasonable encoding of . To be specific, let , where . Whenever is clear from the context, we will omit the superscript and write instead of . Consider signatures and with . The -reduct of a -structure is the -structure with and for all . A -expansion of a -structure is a -structure whose -reduct is .
Conjunctive Queries. We fix a countably infinite set var of variables. We allow queries to use arbitrary relation symbols of arbitrary arities. An atom is of the form with and . We write to denote the set of variables occurring in . A conjunctive query (CQ, for short) is of the form , where , , is an atom for every , and are pairwise distinct elements in . For such a CQ we let . We write and for the set of variables and the set of relation symbols occurring in , respectively. The set of quantified variables of is , and the set of free variables is . We sometimes write to indicate that are the free variables of . The arity of is the number . The query is called quantifier-free if , it is called Boolean if its arity is 0, and it is called self-join-free if no relation symbol occurs more than once in .
The semantics are defined as usual: A valuation for on a -structure is a mapping . A valuation is a homomorphism from to a if for every atom we have . The query result of a CQ on the -structure is defined as the set . Often, we will identify the mappings with tuples , where is a fixed listing of the free variables of .
The size of a query is the length of when viewed as a word over the alphabet .
Model of computation. For the complexity analysis we assume the RAM-model with a uniform cost measure. In particular, storing and accessing elements from a structure’s universe requires space and time. For an -ary relation we can construct in time an index that allows to enumerate with delay and to test for a given -tuple whether in time . Moreover, for every we can build a data structure where we can enumerate for every -tuple the selection with delay. Such a data structure can be constructed in time , for instance by a linear scan over where we add every tuple to a list . Using a constant access data structure of linear size, the list can be accessed in time when receiving an -tuple .
Constant delay enumeration and testing. An enumeration algorithm for query evaluation consists of two phases: the preprocessing phase and the enumeration phase. In the preprocessing phase the algorithm is allowed to do arbitrary preprocessing on the query and the input structure . We denote the time required for this phase by . In the subsequent enumeration phase the algorithm enumerates, without repetition, all tuples (or, mappings) in the query result , followed by the end-of-enumeration message EOE. The delay is the maximum time that passes between the start of the enumeration phase and the output of the first tuple, between the output of two consecutive tuples, and between the last tuple and EOE.
A testing algorithm for query evaluation also starts with a preprocessing phase of time in which a data structure is computed that allows to test for a given tuple (or, mapping) whether it is contained in the query result . The testing time of the algorithm is an upper bound on the time that passes between receiving and providing the answer.
One speaks of constant delay (testing time) if the delay (testing time) depends on the query , but not on the input structure .
We make use of the following result from Durand and Strozecki, which allows to efficiently enumerate the union of query results, provided that each query result in the union can be enumerated and tested efficiently. Note that this is not immediate, because the union might contain many duplicates that need to be avoided during enumeration.
Theorem 2.1 ().
Suppose that there is an enumeration algorithm that receives a query and a structure and enumerates with delay after preprocessing time. Further suppose that there is a testing algorithm that receives a query and a structure and has preprocessing time and testing time. Then there is an algorithm that receives queries and structures and allows to enumerate with delay after preprocessing time.
The induction start is trivial. For the induction step start an enumeration of and test for every tuple whether it is contained in . If the answer is no, then output the tuple. Otherwise discard the tuple and instead output the next tuple in an enumeration of . Subsequently enumerate the remaining tuples from . ∎
3 Main Result
At the end of this section, we provide a precise statement of our main result. Before we can do so, we have to recall the concept of free-connex decompositions of queries and the notion of submodular width. It will be convenient for us to use the following notation.
Let be a CQ and . We write for the CQ that is equivalent to the expression
Note that is obtained from by discarding existential quantification and projecting every atom to , hence . However, shall not be confused with the projection of to . In fact, it might be that is empty, but is not, as the following example illustrates:
3.1 Constant delay enumeration using tree decompositions
We use the same notation as  for decompositions of queries: A tree decomposition (TD, for short) of a CQ is a tuple , for which the following two conditions are satisfied:
is a finite undirected tree.
is a mapping that associates with every node a set such that
for each atom there exists such that , and
for each variable the set induces a connected subtree of (this condition is called path condition).
To use a tree decomposition of for query evaluation one considers, for each the query for , evaluates this query on the input structure , and then combines these results for all along a bottom-up traversal of . If the query is Boolean, this yields the result of on ; if it is non-Boolean, can be computed by performing additional traversals of . This approach is efficient if the result sets are small and can be computed efficiently (later on, we will sometimes refer to the sets as projections on bags).
The simplest queries where this is the case are acyclic queries [10, 16]. A number of equivalent characterisations of the acyclic CQs have been provided in the literature (cf. [1, 25, 27, 18]); among them a characterisation by Gottlob et al.  stating that a CQ is acyclic if and only if it has a tree-decomposition where every bag is covered by an atom, i.e., for every bag there is some atom in with . The approach described above leads to a linear time algorithm for evaluating an acyclic CQ that is Boolean, and if is non-Boolean, is computed in time linear in . This method is known as Yannakakis’ algorithm. But this algorithm does not distinguish between a preprocessing phase and an enumeration phase and does not guarantee constant delay enumeration. In fact, Bagan et al. identified the following additional property that is needed to ensure constant delay enumeration.
Definition 3.2 ().
A tree decomposition of a CQ is free-connex if there is a subset that induces a connected subtree of and that satisfies the condition .
Bagan et al.  identified the free-connex acyclic CQs, i.e., the CQs that have a free-connex tree decomposition where every bag is covered by an atom, as the fragment of the acyclic CQs whose results can be enumerated with constant delay after FPL-preprocessing:
Theorem 3.3 (Bagan et al. ).
There is a computable function and an algorithm which receives a free-connex acyclic CQ and a -structure and computes within preprocessing time and space a data structure that allows to
enumerate with delay and
test for a given tuple (or, mapping) if within testing time.
The approach of using free-connex tree decompositions for constant delay enumeration can be extended from acyclic CQs to arbitrary CQs. To do this, we have to compute for every bag in the tree decomposition the projection . This reduces the task to the acyclic case, where the free-connex acyclic query contains one atom with for every bag and the corresponding relation is defined by . Because the runtime in this approach is dominated by computing , it is only feasible if the projections are efficiently computable for every bag. If the decomposition has bounded treewidth or bounded fractional hypertree width, then it is possible to compute for every bag in time , which in turn implies that the result can be enumerated after FPT-preprocessing time for CQs of bounded fc-tw  and for CQs of bounded fc-fhw .
3.2 Submodular width and statement of the main result
Before providing the precise definition of the submodular width of a query, let us first consider an example. The central idea behind algorithms that rely on submodular width [35, 2, 40] is to split the input structure into several parts and use for every part a different tree decomposition of . This will give a significant improvement over the fractional hypertree width, which uses only one tree decomposition of . A typical example to illustrate this idea is the following -cycle query (see also [2, 40]):
There are essentially two non-trivial tree decompositions , of , which are both defined over the two-vertex tree by , and , . Both tree decompositions lead to an optimal fractional hypertree decomposition of width . Indeed, for the worst-case instance with
we have while the projections on the bags have size in both decompositions:333recall from Section 2 our convention to identify mappings in query results with tuples; the free variables are listed canonically here, by increasing indices
However, we can split into and such that is the disjoint union of and and the bag-sizes in the respective decompositions are small:
Thus, we can efficiently evaluate on using and on using (in time in this example) and then combine both results to obtain . Using the strategy of Alon et al. , it is possible to split every database for this particular 4-cycle query into two instances and such that the bag sizes in on as well as in on are bounded by and can be computed in time (see [2, 40] for a detailed account on this strategy). As both decompositions are free-connex, this also leads to a constant delay enumeration algorithm for with time preprocessing, which improves the preprocessing time that follows from using one decomposition.
In general, whether such a data-dependent decomposition is possible is determined by the submodular width of the query. The notion of submodular width was introduced in . To present its definition, we need the following terminology. A function is
monotone if for all .
edge-dominated if for every atom .
submodular, if for every .
We denote by the set of all monotone, edge-dominated, submodular functions that satisfy , and by the set of all tree decompositions of . The submodular width of a conjunctive query is
In particular, if the submodular width of is bounded by , then for every submodular function there is a tree decomposition in which every bag satisfies .
It is known that for all queries [35, Proposition 3.7]. Moreover, there is a constant and a family of queries such that is bounded and is unbounded [34, 35]. The main result in  is that the submodular width characterises the tractability of Boolean CQs in the following sense.
Theorem 3.4 ().
There is a computable function and an algorithm that receives a Boolean CQ , , and a -structure and evaluates on in time .
Let be a recursively enumerable class of Boolean, self-join-free CQs of unbounded submodular width. Assuming the exponential time hypothesis (ETH) there is no algorithm which, upon input of a query and a structure , evaluates on in time .
The free-connex submodular width of a conjunctive query is defined in a similar way as submodular width, but this time ranges over the set of all free-connex tree decompositions of (it is easy to see that we can assume that is finite).
Note that if is either quantifier-free or Boolean, we have . In general, this is not always the case. Consider for example the following quantified version of the quantifier-free 4-cycle query . Here we have , but : one can verify by noting that every free-connex tree decomposition contains a bag and taking the submodular function . Now we are ready to state the main theorem of this paper.
For every and there is a computable function and an algorithm which receives a CQ with and a -structure and computes within preprocessing time and space a data structure that allows to
enumerate with delay and
test for a given tuple (or, mapping) if within testing time.
The following corollary is an immediate consequence of Theorem 3.5 and Theorem 3.4. A class of CQs is said to be of bounded free-connex submodular width if there exists a number such that for all . And by an algorithm for that enumerates with constant delay after FPT-preprocessing we mean an algorithm that receives a query and a -structure and spends preprocessing time and then enumerates with delay , for a computable function .
For every class of CQs of bounded free-connex submodular width, there is an algorithm for that enumerates with constant delay after FPT-preprocessing.
Let be a recursively enumerable class of quantifier-free self-join-free CQs and assume that the exponential time hypothesis (ETH) holds.
Then there is an algorithm for that enumerates with constant delay after FPT-preprocessing if, and only if, has bounded free-connex submodular width.
4 Proof of the Main Result
To prove Theorem 3.5, we make use of Marx’s splitting routine for queries of bounded submodular width. In the following, we will adapt the main definitions and concepts from  to our notions. While doing this, we provide the following additional technical contributions: First, we give a detailed time and space analysis of the algorithm and improve the runtime of the consistency algorithm [35, Lemma 4.5] from quadratic to linear (see Lemma 4.2). Second, we fix an oversight in [35, Lemma 4.12] by establishing strong -consistency (unfortunately, this fix incurs a blow-up in running time). Afterwards we prove our main theorem, where the non-Boolean setting requires us to relax Marx’s partition into refinements (Lemma 4.5) so that the subinstances are no longer disjoint.
Let be a quantifier-free CQ with , and let . For every where we set and let be a fresh -ary relation symbol. For every collection we let
A refinement of and a -structure is a pair , where is closed under taking subsets and is a -expansion of . Note that if is a refinement of and , then . In the following we will construct refinements that do not change the result relation, i. e., . Subsequently, we will split refinements in order to partition the query result.
The following definition collects useful properties of refinements. Recall from Section 2 that for a CQ and a structure , the query result actually is a set of mappings from to . For notational convenience we define and use the set of mappings instead of the relation . In particular, by addressing/inserting/deleting a mapping from we mean addressing/inserting/deleting the tuple from , where .
Let be a quantifier-free -CQ, a -structure, a refinement of and , and an integer.
The refinement is consistent if
The refinement is -consistent if it is consistent and
The refinement is strongly -consistent if it is -consistent and
There is an algorithm that receives a refinement of and and computes in time a consistent refinement with for all and .
We start by letting and then proceed by iteratively modifying . We first establish the first consistency requirement (9) by removing from every all mappings such that . To ensure the second consistency requirement (10), the algorithm iteratively deletes mappings in that do not extend to larger mappings in (for all ). Note that removing a mapping from might shrink the set for sets that have a nonempty intersection with . In this case, we also have to delete affected mappings from in order to ensure that . These steps will be iterated until the refinement is consistent. It is clear that the refinement does not exclude tuples from the query result, i. e., the final structure satisfies . To see that this can be achieved in time linear in , we formulate the problem as a set of Horn-clauses. The consistent refinement can then be computed by applying any linear-time unit propagation algorithm (cf., e.g., ). For every and every mapping we introduce a Boolean variable which expresses that, in order to achieve consistency, has to be deleted from . The Horn-formula contains for every with the clauses
|for all , and||(13)|
|for all , , .||(14)|
The first type of clauses ensures that when a mapping with domain does not extend to a tuple with domain , then it will be excluded from . The second type of clauses ensures that for all we have . Note that the size of the resulting Horn-formula is bounded by . Now we apply a linear time unit propagation algorithm to find a solution of minimum weight. If the formula is unsatisfiable, we know that and can safely set for all . Otherwise, we obtain a minimal satisfying assignment that sets a variable to true if, and only if, has to be deleted from . Thus we set . By minimality we have . ∎
Let be a quantifier-free CQ, let be a -structure where the largest relation contains tuples, and let . There is an algorithm that computes in time and space a strongly -consistent refinement that satisfies .
The pseudocode of the algorithm is shown in Figure 1. For computing the strongly -consistent refinement we first compute all sets where for all we have ; as in , we say that such sets are -small. First note that the empty set is -small. For nonempty sets we know that is only -small if for every the set is -small and hence already included in . If this is the case, then can be computed in time by testing for every (for an arbitrary ) and every element in the structure’s universe, whether