1 Introduction
When evaluating a nonboolean Conjunctive Query (CQ) over a database, the number of results can be huge. Since this number may be larger than the size of the database itself, we need to use specific measures of enumeration complexity to describe the hardness of such a problem. In this perspective, the best we can hope for is to constantly output results, in such a way that the delay between them is unaffected by the size of the database instance. For this to be possible, we need to allow a precomputation phase before printing the first result, as linear time preprocessing is necessary to read the input instance.
A known dichotomy determines when the answers to selfjoinfree acyclic CQs can be enumerated with constant delay after linear time preprocessing [3]. This class of enumeration problems, denoted by , can be regarded as the most efficient class of nontrivial enumeration problems and therefore current work on query enumeration has focused on this class [9, 14, 5]. Bagan et al.[3] show that a subclass of acyclic queries, called freeconnex, are exactly those that are enumerable in , under the common assumption that boolean matrix multiplication cannot be solved in quadratic time. An acyclic query is called freeconnex if the query remains acyclic when treating the head of the query as an additional atom. This and all other results in this paper hold under the RAM model [15].
The above mentioned dichotomy only holds when applied to databases with no additional assumptions, but oftentimes this is not the case. In practice, there is usually a connection between different attributes, and Functional Dependencies (FDs) and Cardinality Dependencies (CDs) are widely used to model situations where some attributes imply others. As the following example shows, these constraints also have an immediate effect on the complexity of enumerating answers for queries over such a schema. For a list of actors and the production companies they work with, we have the query: . At first glance, it appears as though this query is not in , as it is acyclic but not freeconnex. Nevertheless, if we take the fact that a movie has only one production company into account, we have the FD , and the enumeration problem becomes easy: we only need to iterate over all tuples of Cast and replace the value with the single value that the relation Release assigns to it. This can be done in linear time by first sorting (in linear time [10]) both relations according to . ∎
Example 1 shows that the dichotomy by Bagan et al. [3] does not hold in the presence of FDs. In fact, we believe that dependencies between attributes are so common in real life, that ignoring them in such dichotomies can lead to missing a significant portion of the tractable cases. Therefore, to get a realistic picture of the enumeration complexity of CQs, we have to take dependencies into account. The goal of this work is to generalize the dichotomy to fully accommodate FDs.
Towards this goal, we introduce an extension of a query according to the FDs. The extension is called the FDextended query, and denoted . In this extension, each atom, as well as the head of the query, contains all variables that can be implied by its variables according to some FD. This way, instead of classifying every combination of CQ and FDs directly, we encode the dependencies within the extended query, and use the classification of to gain insight regarding . This approach draws inspiration from the proof of a dichotomy in the complexity of deletion propagation, in the presence of FDs [11]. However, the problem and consequently the proof techniques are fundamentally different.
The FDextension is defined in such a way that if is satisfied by an assignment, then the same assignment also satisfies the extension , as the underlying instance is bound by the FDs. In fact, we can show that enumerating the solutions of under FDs can be reduced to enumerating the solutions of . Therefore, tractability of ensures that can be efficiently solved as well. By using the positive result in the known dichotomy, is tractable w.r.t enumeration if it is freeconnex. Moreover, it can be shown that the structural restrictions of acyclicity and freeconnex are closed under taking FDextensions. Hence, the class of all queries such that is freeconnex is an extension of the class of freeconnex queries, and this extension is in fact proper. We denote the classes of queries such that is acyclic or freeconnex as FDacyclic respectively FDfreeconnex.
To reach a dichotomy, we now need to answer the following question: Is it possible that can be enumerated efficiently even if is not freeconnex? To show that an enumeration problem is not within a given class, enumeration complexity has few tools to offer. One such tool is a notion of completeness for enumeration problems [8]. However, this notion focuses on problems with a complexity corresponding to higher classes of the polynomial hierarchy. So in order to deal with this problem, Bagan et al. [3] reduced the matrix multiplication problem to enumerating the answers to any query that is acyclic but not freeconnex. This reduction fails, however, when dependencies are imposed on the data, as the constructed database instance does not necessarily satisfy the underlying dependencies.
As it turns out, however, the structure of the FDextended query allows us to extend this reduction to our setting. By carefully expanding the reduced instance such that on the one hand, the dependencies hold and on the other hand, the reduction can still be performed within linear time, we establish a dichotomy. That is, we show that the tractability of enumerating the answers of a selfjoinfree query in the presence of FDs is exactly characterized by the structure of : Given an FDacyclic query , we can enumerate the answers to within the class iff is FDfreeconnex.
The resulting extended dichotomy, as well as the original one, brings insight to the case of acyclic queries. Concerning unrestricted CQs, providing even a first solution of a query in linear time is impossible in general. This is due to the fact that the parameterized complexity of answering boolean CQs, taking the query size as the parameter, is hard [13]. This does not imply, however, that there are no cyclic queries with the corresponding enumeration problems in . The fact that no such queries exist requires an additional proof, which was presented by BraultBaron [6]. This result holds under a generalization of the triangle finding problem, which is considered not to be solvable within linear time [16]. As before, this proof does no longer apply in the presence of FDs. Moreover, it is possible for to be cyclic and acyclic. In fact, may even be freeconnex, and therefore tractable in . We show that, under the same assumptions used by BraultBaron [6], the evaluation problem for a selfjoinfree CQ in the presence of unary FDs where is cyclic cannot be solved in linear time. As linear time preprocessing is not enough to achieve the first result, a consequence is that enumeration within is impossible in that case. This covers all types of CQs and shows a full dichotomy, at least for the case of unary FDs.
The results we present here are not limited to FDs. CDs (Cardinality Dependencies) [7, 2] are a generalization of FDs, denoted . Here, the righthand side does not have to be unique for every assignment to the lefthand side, but there can be at most different values to the variables of for every value of the variables of . FDs are in fact a special case of CDs where . Constraints of that form appear naturally in many applications. For example: a movie has only a handful of directors, there are at most 200 countries, and a person is typically limited to at most 5000 friends in (some) social networks. We show that all results described in this paper also apply to CDs. Moreover, we show how our results can be easily used to yield additional results, such as a dichotomy for CQs with disequalities, and a dichotomy to evaluate CQs with linear delay.
Contributions.
Our main contributions are as follows.

We extend the class of queries that can be evaluated in by incorporating the FDs. This extension is the class of FDfreeconnex CQs.

We establish a dichotomy for the enumeration complexity of selfjoinfree FDacyclic CQs. Consequently, we get a dichotomy for selfjoinfree acyclic CQs under FDs.

We show a lower bound for FDcyclic CQs. In particular, we get a dichotomy for all selfjoinfree CQs in the presence of unary FDs.

We extend our results to CDs.
This work is organized as follows: In Section 2 we provide definitions and state results that we will use. Section 3 introduces the notion of FDextended queries and establishes the equivalence between a query and its FDextension. The generalized version of the dichotomy is shown in Section 4. In Section 5, a lower bound for cyclic queries under unary FDs is shown, and Section 6 shows that all results from the previous sections extend to CDs. Concluding remarks are given in Section 7. Full proofs for all of our results are given in the appendix.
2 Preliminaries
In this section we provide preliminary definitions as well as state results that we will use throughout this paper.
Schemas and Functional Dependencies.
A schema is a pair where is a finite set of relational symbols and is a set of Functional Dependencies (FDs). We denote the arity of a relational symbol as . An FD has the form , where and are nonempty with .
Let be a finite set of constants. A database over schema is called an instance of , and it consists of a finite relation for every relational symbol , such that all FDs in are satisfied. An FD is said to be satisfied if, for all tuples that are equal on the indices of , and are equal on the indices of . Here we assume that all FDs are of the form , where , as we can replace an FD of the form where by the set of FDs . If , we say that is a unary FD.
Conjunctive Queries.
Let be a set of variables disjoint from . A Conjunctive Query (CQ) over a schema is an expression of the form , where are relational symbols of , the tuples hold variables, and every variable in appears in at least one of . We often denote this query as or even . Define the variables of as , and define the free variables of as . We call the head of , and the atomic formulas are called atoms. We further use to denote the set of atoms of Q. A CQ is said to contain selfjoins if some relation symbol appears in more than one atom.
For the evaluation of a CQ with free variables over a database , we define to be the set of all mappings such that is a homomorphism from into , where denotes the restriction (or projection) of to the variables . The problem is, given a database instance , determining whether such a mapping exists.
Given a query over a schema , we often identify an FD as a mapping between variables. That is, if has the form for , we sometimes denote it by , where is the th variable of . To distinguish between these two representations, we usually denote subsets of integers by , integers by , and variables by letters from the end of the alphabet.
Hypergraphs.
A hypergraph is a pair consisting of a set of vertices, and a set of nonempty subsets of called hyperedges (sometimes edges). A join tree of a hypergraph is a tree where the nodes are the hyperedges of , and the running intersection property holds, namely: for all the set forms a connected subtree in . A hypergraph is said to be acyclic if there exists a join tree for . Two vertices in a hypergraph are said to be neighbors if they appear in the same edge. A clique of a hypergraph is a set of vertices, which are pairwise neighbors in . A hypergraph is said to be conformal if every clique of is contained in some edge of . A chordless cycle of is a tuple such that the set of neighboring pairs of variables of is exactly . It is well known (see [4]) that a hypergraph is acyclic iff it is conformal and contains no chordless cycles.
A pseudominor of a hypergraph is a hypergraph obtained from by a finite series of the following operations: (1) vertex removal: removing a vertex from and from all edges in that contain it. (2) edge removal: removing an edge from provided that some other contains it. (3) edge contraction: replacing all occurrences of a vertex (within every edge) with a vertex , provided that and are neighbors.
Classes of CQs.
To a CQ we associate a hypergraph where the vertices are the variables of and every hyperedge is a set of variables occurring in a single atom of , that is . With a slight abuse of notation, we also identify atoms of with edges of . A CQ is said to be acyclic if is acyclic, and it is said to be freeconnex if both and are acyclic.
A headpath for a CQ is a sequence of variables with , such that: (1) (2) (3) It is a chordless path in , that is, two succeeding variables appear together in some atom, and no two nonsucceeding variables appear together in an atom. Bagan et al. [3] showed that an acyclic CQ has a headpath iff it is not freeconnex.
Enumeration Complexity.
Given a finite alphabet and binary relation , we denote by the enumeration problem of given an instance , to output all such that . In this paper we adopt the Random Access Machine (RAM) model (see [15]). Previous results in the field assume different variations of the RAM model. Here we assume that the length of memory registers is linear in the size of value registers, that is, the accessible memory is polynomial. For a class of enumeration problems, we say that , if there is a RAM that – on input – outputs all with without repetition such that the first output is computed in time and the delay between any two consecutive outputs after the first is , where:

For , we have and .

For , we have .
Let and be enumeration problems. We say that there is an exact reduction from to , written as , if there are mappings and such that for every the mapping is computable in , for every with , is computable in constant time and in multiset notation. Intuitively, is used to map instances of to instances of , and is used to map solutions to to solutions of . An enumeration class is said to be closed under exact reduction if for every and such that and , we have . Bagan et al. [3] proved that is closed under exact reduction. The same proof holds for any meaningful enumeration complexity class that guarantees generating all unique answers with at least linear preprocessing time and at least constant delay between answers.
Enumerating Answers to CQs.
For a CQ over a schema , we denote by the enumeration problem , where is the binary relation between instances over and sets of mappings . We consider the size of the query as well as the size of the schema to be fixed. Bagan et al. [3] showed that a selfjoinfree acyclic CQ is in iff it is freeconnex:
[[3]] Let be an acyclic CQ without selfjoins over a schema .

If is freeconnex, then .

If is not freeconnex, then , assuming the product of two boolean matrices cannot be computed in time .
3 FDExtended CQs
In this section, we formally define the extended query . We then discuss the relationship between and : their equivalence w.r.t. enumeration and the possible structural differences between them. As a result, we obtain that if is in a class of queries that allows for tractable enumeration, then is tractable as well.
We first define . The extension of an atom according to an FD where is possible if but . In that case, is added to the variables of . The FDextension
of a query is defined by iteratively extending all atoms as well as the head according to every possible dependency in the schema, until a fixpoint is reached. The schema extends accordingly: the arities of the relations increase as their corresponding atoms extend, and dummy variables are added to adjust to that change in case of selfjoins. The FDs apply in every relation that contains all relevant variables.
[(FDExtended Query)] Let be a CQ over a schema . We define two types of extension steps:

The extension of an atom according to an FD .
Prerequisites: and .
Effect: The arity of increases by one, and is replaced by . In addition, every such that = and is replaced with , where is a fresh variable. 
The extension of the head according to an FD .
Prerequisites: and .
Effect: The head is replaced by .
The FDextension of is the query , obtained by performing all possible extension steps on according to FDs of until a fixpoint is reached. The extension is defined over the schema , where is with the extended arities, and .
Given a query, its FDextension is unique up to a permutation of the added variables, and renaming of the new variables. As the order of the variables and the naming make no difference w.r.t. enumeration, we can treat the FDextension as unique.
Consider a schema with , and the query . As the FDs are and , the FDextension is . We first apply on the head, and then and consequently on . These two FDs now appear in the schema also for , and the FDs of the extended schema are . ∎
We later show that the enumeration complexity of a CQ over a schema with FDs only depends on the structure of , which is implicitly given by . Therefore, we introduce the notions of acyclic and freeconnex queries for FDextensions:
Let be a CQ over a schema , and let be its FDextension.

We say that is FDacyclic, if is acyclic.

We say that is FDfreeconnex, if is freeconnex.

We say that is FDcyclic, if is cyclic.
The following proposition shows that the classes of acyclic queries and freeconnex queries are both closed under constructing FDextensions.
Let be a CQ over a schema .

If the query is acyclic, then it is FDacyclic.

If the query is freeconnex, then it is FDfreeconnex.
Proof.
We prove that if is acyclic, then is also acyclic (the case where is freeconnex follows along the same lines). Denote by a sequence of queries such that is the result of extending all possible relations of according to a single FD . By induction, it suffices to show that if is acyclic, then is acyclic as well. So consider an acyclic query extended to the query according to the FD . Further let be the join tree of . We claim that the same tree (but with the extended atoms), is a join tree for . More formally, define such that and . Next we show that the running intersection property holds in , and therefore it is a join tree of .
For the new variables introduced in the extension, every such variable appears only in one atom, so the subtree of containing such a variable contains one node and is trivially connected. For any other variable , the attribute appears in the same atoms in and . Therefore, the subgraph of containing is isomorphic to the subgraph of containing , and since is a join tree, it is connected. It is left to show that the subtree of containing is connected. let be the atom in containing . Note that corresponds to vertices in and containing and . Let be some vertex in containing . We will show that all vertices on the path between and contain . If appears in the vertex in , then it also appears in since is a join tree. Since the extension doesn’t remove occurrences of variables, appears in these vertices in as well. Otherwise, was added to via . Since is a join tree, the vertices all contain the variables . Thus by the definition of , is added to each of (if it was not already there) in . Thus also the subtree of containing is connected. Therefore is indeed a join tree. ∎
Example 1 shows that the converse of the proposition above does not hold. This means that, by Theorem 2, there are queries such that we can enumerate the answers to in , but we cannot enumerate the answers to with the same complexity, if we do not assume the FDs. The following lemma shows that enumerating the answers of (when relying on the FDs) is in fact equally hard as enumerating the answers of .
Let be a CQ over a schema , and let be its FDextended query. Then and .
We first sketch the reduction . Given an instance for the problem , we set as described next. We start by removing tuples that interfere with the extended dependencies. For every dependency and every atom that contains the variables , we only keep tuples of that agree with some tuple of over the values of . Next, we follow the extension of the schema, and in each step we extend some to according to some FD . For each tuple , if there is no tuple that agrees with over the values of , then we remove altogether. Otherwise, we copy to and assign with the same value that assigns it. Given an answer , we set to be the projection of to . To show that , we describe the construction of an instance by “reversing” the extension steps. If an atom was extended, we simply remove the added attribute. If the head was extended using some , then for each tuple in that assigns and with the values and respectively, we add the value to a lookup table with pointer . For every , is defined as extended by the values from the lookup table.
Proof.
Let and . We first show that . Given an instance for the problem , we set as described next. We start by removing tuples that interfere with the extended dependencies. For every dependency and every atom that contains the corresponding variables (i.e., ), we correct according to : We only keep tuples of that agree with some tuple of over the values of . We say that a tuple agrees with a tuple on the value of a variable if for every pair of indices such that we have that . This check can be done in linear time by first sorting both and according to , and then performing one scan over both of them. Next, we follow the extension of the schema, and in each step we extend some to according to some FD as described in Definition 3. For each tuple , if there is no tuple that agrees with over the values of , then we remove altogether. Otherwise, we copy to and assign with the same value that assigns it. We say that a tuple assigns a variable with the value if for every index such that we have that . Given an answer , we set to be the projection of to . The projection is computable in constant time.
For the correctness, we need to show that in multiset notation. The easy direction is that if then . Since is a homomorphism from to , and since all tuples of appear (perhaps projected) in , then is also a homomorphism from to . We now show the opposite direction, that if then . Consider a sequence of queries such that each one is the result of extending an atom or the head of the previous query according to an FD . We claim that if is an answer for , then is an answer for . This claim is trivial in case the head was extended. Note also that there cannot be two answers and to such that , as the added variable is bound by the FD to have only one possible value. Now consider the case where an atom was extended since . Denote by and the tuples that are mapped by from and respectively. The construction guarantees that and agree on the value of , so can still map the extended to the extended . In case of selfjoins, other atoms with the relation are extended with a new and distinct variable, and the new variable can be mapped to any value appearing in the extension. Therefore if then .
To show that , we now define the mapping between instances. Let be an instance of . First, we “clean” from any tuples that disagree with original FDs. That is, for every FD and every atom such that , remove all tuples that agree with some tuple over but disagree with over . This can be done in linear time by first sorting both and according to . Next, we construct a lookup table . For every added to the head due to an FD , denote by
a vector containing the variables of
in lexicographic order, for each tuple in that assigns and with the values and respectively, we add the value to the lookup table with pointer . Note that due to the FD, a pointer cannot map to two different values. Lastly, we project the relations to . These steps result in the construction of an instance and a lookup table in linear time. Given , we now define . We define a mapping for the variables added to the head using the lookup table. For every added due to some FD , we add to . We define . Note that is computable in constant time since we can use the lookup table in constant time.We need to show that in multiset notation. First we claim that given , we have that . If maps to , then was added to the head due to some FD , and there is some tuple in that assigns and with the values and respectively. Due to the dependency, all tuples of which assign with , also assign with , and this is also true in . Therefore also maps to . This means that . We now show the first direction, that given we have that . We now claim that is (a subset of) a homomorphism from to . We know that is a homomorphism from to . For any , denote by the tuple . If an atom was extended due to an FD , then and the extension of must agree on , otherwise this would have been deleted in the cleaning phase. In case of selfjoins, additional atoms such that may have been extended with new variables. As each new variable has only one occurrence, the extension of these atoms does not interfere with , as the new variables can map to any value present in the tuple that was mapped by from . We conclude that . The second direction is that given , we have that and . It is only left to show that . Indeed, if maps an atom to a tuple , then it maps to the same (perhaps projected) tuple in . This tuple was not removed during the cleaning phase, as the only removed tuples do not have a tuple of agreeing with them on the value of , and therefore cannot map to them. ∎
The direction of Theorem 3 proves that FDextensions can be used to expand tractable enumeration classes, as the following corollary states.
Let be an enumeration class that is closed under exact reduction. Let be a CQ and let be its FDextended query. If , then .
Proof.
According to Theorem 3, . Since and is closed under exact reduction, we have that . ∎
Since freeconnex queries are in and is closed under exact reduction, if is an FDfreeconnex query, then the corresponding enumeration problem is in . This follows from Theorem 2 and the fact that .
Let be a CQ over a schema . If is FDfreeconnex, then .
Proof.
According to Theorem 2, we have that as is freeconnex. Given an instance over the schema , the same instance is also over , and any query has the same answers over both schemas. Therefore, we have the reduction by using the identity mapping. Overall, we conclude that , and using Corollary 3 we get that ∎
4 A Dichotomy for Acyclic CQs
In this section, we characterize which selfjoinfree FDacyclic queries are in . We use the notion of FDextended queries defined in the previous section to establish a dichotomy stating that enumerating the answers to an FDacyclic query is in iff the query is FDfreeconnex. We will prove the following theorem:
Let be an FDacyclic CQ without selfjoins over a schema .

If is FDfreeconnex, then .

If is not FDfreeconnex, then , assuming that the product of two boolean matrices cannot be computed in time .
Proof.
The positive case for the dichotomy was described in Corollary 3. Note that the restriction of considering only selfjoinsfree queries is required only for the negative side. This assumption is standard [3, 6, 11], as it allows to assign different atoms with different relations independently. The hardness result described here builds on that of Bagan et al. [3] for databases that are assumed not to have FDs, and it relies on the hardness of the boolean matrix multiplication problem. This problem is defined as the enumeration of the query over the schema where . It is strongly conjectured that this problem is not computable in time and currently, the best known algorithms require time for some [12, 1].
The original proof describes an exact reduction . Since is acyclic but not freeconnex, it contains a headpath . Given an instance of the matrix multiplication problem, an instance of is constructed, where the variables , and of the headpath respectively encode the variables , and of , while all other variables of are assigned constants. This way, is encoded by an atom containing and , and is encoded by an atom containing and . Atoms containing some and only propagate the value of . Since and are in , but are not, the answers to correspond to those of . As no atom of