Enumeration Complexity of Unions of Conjunctive Queries

12/10/2018
by   Nofar Carmeli, et al.
Technion
0

We study the enumeration complexity of answering unions of conjunctive queries with respect to linear time preprocessing and constant delay.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

12/21/2017

Enumeration Complexity of Conjunctive Queries with Functional Dependencies

We study the complexity of enumerating the answers of Conjunctive Querie...
03/17/2022

Efficiently Enumerating Answers to Ontology-Mediated Queries

We study the enumeration of answers to ontology-mediated queries (OMQs) ...
12/02/2021

A short note on the counting complexity of conjunctive queries

This note closes a minor gap in the literature on the counting complexit...
06/28/2022

Which arithmetic operations can be performed in constant time in the RAM model with addition?

In the literature of algorithms, the specific computation model is often...
01/21/2021

Work-sensitive Dynamic Complexity of Formal Languages

Which amount of parallel resources is needed for updating a query result...
01/29/2020

Optimal and Perfectly Parallel Algorithms for On-demand Data-flow Analysis

Interprocedural data-flow analyses form an expressive and useful paradig...
09/18/2017

Enumeration on Trees under Relabelings

We study how to evaluate MSO queries with free variables on trees, withi...

1. Introduction

Evaluating a query over a database instance is a fundamental and well-studied problem in database management systems. Most of the complexity results dealing with query evaluation so far aim to solve a decision or counting variant of it. Starting with Bagan, Durand and Grandjean in 2007 (Bagan et al., 2007), there has been a renewed interest in examining the enumeration problem of all answers to a query, focusing on fine-grained complexity (Schweikardt et al., 2018; Niewerth and Segoufin, 2018; Florenzano et al., 2018; Carmeli and Kröll, 2018). When evaluating a non-boolean query over a database, the number of results may be larger than the size of the database itself. Enumeration complexity offers specific measures for the hardness of such problems. In terms of data complexity, the best time guarantee we can hope for is to output all answers with a constant delay between consecutive answers. In the case of query evaluation, this enumeration phase comes after a linear preprocessing phase required to read the database and decide the existence of a first answer. The enumeration class achieving these time bounds is denoted by . Hereafter, we refer to queries in as tractable, and queries outside of this class as intractable.

Results by Bagan et al. (Bagan et al., 2007) and Brault-Baron (Brault-Baron, 2013)

form a dichotomy that fully classifies which self-join free Conjunctive Queries (CQs) are in the class

based on the structure of the query: the class of free-connex queries is exactly the class that admits tractable enumeration. In the years following this dichotomy, much work has been conducted to achieve similar results for other classes of queries (Segoufin, 2015). Unions of CQs (UCQs) are a natural extension of CQs, as they describe the union of the answers to several CQs. UCQs form an important class of queries, as it captures the positive fragment of relational algebra. Previous work which implies results on the enumeration complexity of UCQs imposes strong restrictions on the underlying database (Segoufin and Vigny, 2017). We aim to understand the enumeration complexity of UCQs without such restrictions and based solely on their structure.

Using known methods (Strozecki, 2010), it can be shown that a union of tractable problems is again tractable. However, what happens if some CQs of a union are tractable while others are not? Intuitively, one might be tempted to expect a union of enumeration problems to be harder than a single problem within the union, making such a UCQ intractable as well. As we will show, this is not necessarily the case. Consider the union of the following queries:

Even though is hard while is easy, a closer look shows that contains . This means that is redundant, and the entire union is equivalent to the easy . To avoid cases like these, where the UCQ can be translated to a simpler one, it makes sense to consider non-redundant unions. It was claimed that in all cases of a non-redundant union containing an intractable CQ, the UCQ is intractable too (Berkholz et al., 2018). The following is a counter example which refutes this claim.

Example 1.1 ().

Let with

According to the dichotomy of Bagan et al. (Bagan et al., 2007), the enumeration problem for is in , while is intractable. Yet, it turns out that is in fact in . The reason is that, since and are evaluated over the same database, we can use to find . We can compute efficiently, and try to extend every such solution to solutions of with a constant delay: for every new combination of an output , we find all values with and then output the solution . Intuitively, the source of intractability for is the join of with as we need to avoid duplicates that originate in different values. The union is tractable since returns exactly this join. ∎

As the example illustrates, to compute the answers to a UCQ in an efficient way, it is not enough to view it as a union of isolated instances of CQ enumeration. In fact, this task requires an understanding of the interaction between several queries. Example 1.1 shows that the presence of an easy query within the union may give us enough time to compute auxiliary data structures, which we can then add to the hard queries in order to enumerate their answers as well. In Example 1.1, we can assume we have a ternary relation holding the result of . Then, adding the auxiliary atom to results in a tractable structure. We generalize this observation and introduce the concept of union-extended queries. We then use union extensions as a central tool for evaluating the enumeration complexity of UCQs, as the structure of such queries has implications on the tractability of the UCQ.

Interestingly, this approach can be taken a step further: We show that the concept of extending the union by auxiliary atoms can even be used to efficiently enumerate the answers of UCQs that only contain hard queries. By lifting the concept of free-connex queries from CQs to UCQs via union-extended queries, we show that free-connex UCQs are always tractable. This gives us a sufficient global condition for membership in that goes beyond any classification of individual CQs.

The question of a finding a full characterization of the tractability of UCQs with respect to remains open. Nevertheless, we prove that for several classes of queries, free-connexity fully captures the tractable UCQs. A non-free-connex union of two CQs is intractable in the following two cases: both CQs are intractable, or they both represent the same CQ up to a different projection. The hardness results presented here use problems with well-established assumptions on the lower bounds, such as boolean matrix-multiplication (Le Gall, 2014) or finding a clique or a hyperclique in a graph (Lincoln et al., 2018).

Why is establishing lower bounds on UCQ evaluation, even when it contains only two CQs, a fundamentally more challenging problem than the one for CQs? In the case of CQs, hardness results are often shown by reducing a computationally hard problem to the task of answering a query. The reduction encodes the hard problem to the relations of a self-join free CQ, such that the answers of the CQ correspond to an answer of this problem (Bagan et al., 2007; Brault-Baron, 2013; Berkholz et al., 2017; Carmeli and Kröll, 2018). However, using such an encoding for CQs within a union does not always work. Similarly to the case of CQs with self-joins, relational symbols that appear multiple times within a query can interfere with the reduction. Indeed, when encoding a hard problem to an intractable CQ within a union, a different CQ in the union evaluates over the same relations, and may also produce answers. A large number of such supplementary answers, with constant delay per answer, accumulates to a long delay until we obtain the answers that correspond to the computationally hard problem. If this delay is larger than the lower bound we assume for the hard problem, we cannot conclude that the UCQ is intractable.

The lower bounds presented in this paper are obtained either by identifying classes of UCQs for which we can use similar reductions to the ones used for CQs, or by introducing alternative reductions. As some cases remain unclassified, we spend the last section of this paper inspecting such UCQs, and describing the challenges that will need to be resolved in order to achieve a full classification.

Our main contributions are as follows:

  • We show that some non-redundant UCQs containing intractable CQs are tractable, and even that unions containing only intractable CQs may be tractable.

  • We extend the notion of free-connexity to UCQs, and show that free-connex UCQs can be evaluated with linear time preprocessing and constant delay.

  • We prove lower bounds on UCQs with respect to , and establish that free-connexity captures exactly the tractable cases in some classes of UCQs.

  • We provide a discussion accompanied by examples describing the challenges that will need to be resolved in order to achieve a full characterization of the tractable UCQs.

This work is organized as follows: In Section 2 we provide definitions and state results that we use. Section 3 formalizes how CQs within a union can make each other easier, defines free-connex UCQs, and proves that free-connex UCQs are in . In Section 4 we prove conditional lower bounds and conclude a dichotomy for some classes of UCQs. Section 5 discusses the future steps required for a full classification, and demonstrates examples of queries of unknown complexity. Concluding remarks are given in Section 6. In the cases where we only provide proof sketches in the body of the paper, full proofs are in the appendix.

2. Preliminaries

In this section we provide preliminary definitions as well as state results that we will use throughout this paper.

Unions of Conjunctive Queries

A schema is a set of relational symbols. We denote the arity of a relational symbol as . Let be a finite set of constants. A database over schema is called an instance of , and it consists of a finite relation for every relational symbol .

Let be a set of variables disjoint from . A Conjunctive Query (CQ) over schema is an expression of the form , where are relational symbols of , the tuples hold variables, and every variable in appears in at least one of . We often denote this query as . Define the variables of as , and define the free variables of as . We call the head of , and the atomic formulas are called atoms. We further use to denote the set of atoms of Q. A CQ is said to be self-join free if no relational symbol appears in more than one atom.

The evaluation of a CQ over a database is the set of all mappings such that is a homomorphism from into , and is the restriction (or projection) of to the variables . We say that a CQ is contained in a CQ , denoted , if for every instance , . A homomorphism from to is a mapping such that: (1) for every atom of , is an atom in ; (2) .

A Union of Conjunctive Queries (UCQ) is a set of CQs, denoted , where for all . Semantically, . Given a UCQ and a database instance , we denote by the problem of deciding whether .

Hypergraphs

A hypergraph is a set of vertices and a set of non-empty subsets of called hyperedges (sometimes edges). A join tree of a hypergraph is a tree where the nodes are the hyperedges of , and the running intersection property holds, namely: for all the set forms a connected subtree in . A hypergraph is said to be acyclic if there exists a join tree for .

Two vertices in a hypergraph are said to be neighbors if they appear in the same edge. A clique of a hypergraph is a set of vertices, which are pairwise neighbors in . If every edge in has many vertices, we call -uniform. An -hyperclique in a -uniform hypergraph is a set of vertices, such that every subset of of size forms a hyperedge.

A hypergraph is an inclusive extension of if every edge of appears in , and every edge of is a subset of some edge in . A tree is an ext--connex tree for a hypergraph if: (1) is a join-tree of an inclusive extension of , and (2) there is a subtree of that contains exactly the variables  (Bagan et al., 2007) (see Figure 1).

Figure 1. is an ext--connex tree for hypergraph .

Classes of CQs

We associate a hypergraph to a CQ where the vertices are the variables of , and every hyperedge is a set of variables occurring in a single atom of . That is, . With a slight abuse of notation, we identify atoms of with edges of . A CQ is said to be acyclic if is acyclic

A CQ is -connex if has an ext--connex tree, and it is free-connex if it has an ext--connex tree (Bagan et al., 2007). Equivalently, is free-connex if both and are acyclic (Brault-Baron, 2013). A free-path in a CQ is a sequence of variables with , such that: (1) (2) (3) It is a chordless path in : that is, every two succeeding variables are neighbors in , but no two non-succeeding variables are neighbors. An acyclic CQ has a free-path iff it is not free-connex (Bagan et al., 2007).

Computational Model

In this paper we adopt the Random Access Machine (RAM) model with uniform cost measure. For an input of size , every register is of length . Operations such as addition of the values of two registers or concatenation can be performed in constant time. In contrast to the Turing model of computation, the RAM model with uniform cost measure can retrieve the content of any register via its unique address in constant time. This enables the construction of large lookup tables that can be queried within constant time.

We use a variant of the RAM model named DRAM (Grandjean, 1996), where the values stored in registers are at most for some fixed integer . As a consequence, the amount of available memory is polynomial in . The size of the input to most of our problems is measured only by the size of the database instance . We denote by the size of an object (i.e., the number of integers required to store it), whereas is its cardinality. Let be a database over a schema . Flum et al. describe a reasonable encoding of the database as a word over integers bound by  (Flum et al., 2002). A computation takes linear time if the number of operations is .

Enumeration Complexity

Given a finite alphabet and binary relation , the enumeration problem is: given an instance , output all such that . Such values are often called solutions or answers to . An enumeration algorithm for is a RAM that solves without repetitions. We say that enumerates with delay if the time before the first output, the time between any two consecutive outputs, and the time between the last output and termination are each bound by . Sometimes we wish to relax the requirements of the delay before the first answer, and specify a preprocessing time . In this case, the time before the first output is only required to be bound by . The enumeration class is defined as the class of all enumeration problems which have an enumeration algorithm with preprocessing and delay . Note that we do not impose a restriction on the memory used. In particular, such an algorithm may use additional constant memory for writing between two consecutive answers.

Let and be enumeration problems. There is an exact reduction from to , denoted as , if there exist mappings and s.t. :

  • is computable in for every ;

  • is computable in for every s.t. ;

  • in multiset notation.

Intuitively, maps instances of to instances of , and maps solutions of to solutions of . If and also , then as well (Bagan et al., 2007).

Computational Hypotheses

In the following, we will use the following well-established hypotheses for lower bounds on certain computational problems:

  • mat-mul: two boolean matrices cannot be multiplied in time .
    This problem is equivalent to the evaluation of the query over the schema where . It is strongly conjectured that this problem cannot be solved in time, and the best algorithms today require time for some  (Le Gall, 2014; Abboud and Williams, 2014).

  • hyperclique: finding a -hyperclique in a -uniform graph is not possible in time for all .
    This is a special case of the Hyperclique Hypothesis (Lincoln et al., 2018), which states that, in a -uniform hypergraph of vertices, time is required to find a set of vertices such that each of it subsets of size forms a hyperedge. The hyperclique hypothesis is sometimes called Tetra (Brault-Baron, 2013).

  • 4-clique: it is not possible to determine whether a -clique exists in a graph with nodes in time .
    This is a special case of the -Clique Hyothesis (Lincoln et al., 2018), which states that detecting a clique in a graph with nodes requires time, where is the matrix multiplication exponent.

Enumerating Answers to UCQs

Given a UCQ over a schema , we denote by the enumeration problem , where is the binary relation between instances over and sets of mappings . We consider the size of the query as well as the size of the schema to be fixed. In the case of CQs, Bagan et al. (Bagan et al., 2007) showed that a self-join free acyclic CQ is in iff it is free-connex. In addition, Brault-Baron (Brault-Baron, 2013) showed that self-join free cyclic queries are not in . In fact, the existence of a single answer to a cyclic CQ cannot be determined in linear time.

Theorem 2.1 ((Bagan et al., 2007; Brault-Baron, 2013)).

Let be a self-join free CQ.

  1. If is free-connex, then .

  2. If is acyclic non-free-connex, then , assuming mat-mul.

  3. If is cyclic, then , as cannot be solved in linear time, assuming hyperclique

The positive case of this dichotomy can be shown using the Constant Delay Yannakakis (CDY) algorithm (Idris et al., 2017). It uses an ext--connex tree for . First, it performs the calssical Yannakakis preprocessing (Yannakakis, 1981) over to obtain a relation for each node in with no dangling tuples. i.e., where all tuples can be used for some answer in . Then, it considers only the subtree of containing , and joins the relations corresponding to this subtree with constant delay.

3. Upper Bounds via Union Extensions

In this section we generalize the notion of free-connexity to UCQs and show that such queries are in . We do so by introducing the concepts of union extensions and variable sets that a single CQ can provide to another CQ in the union in order to help evaluation. First, note that using known techniques (Strozecki, 2010, Proposition 2.38) a union of tractable CQs is also tractable.

Theorem 3.1 ().

Let be a UCQ. If all CQs in are free-connex, then .

Proof.

The following is an algorithm to evaluate a union of two CQs. In case of a union of more CQs, we can use this recursively by treating the second query as .

1:while  do 2:     if  then 3:         print 4:     else 5:         print       6:while  do 7:     print

By the end of the run, the algorithm prints over all iterations of line 3, and it prints in lines 5 and 7. Line 5 is called times, so the command always succeeds there. Since free-connex CQs can be enumerated in constant delay and tested in constant time after a linear time preprocessing phase, this algorithm runs within the required time bounds. ∎

The technique presented in the proof of 3.1 has the advantage that it does not require more than constant memory available for writing in the enumeration phase. Alternatively, this theorem is a consequence of the following lemma, which gives us a general approach to compile several enumeration algorithms into one. This lemma is useful to show upper bounds for UCQs even in cases not covered by Theorem 3.1.

Lemma 3.2 ().

Let , and let be an algorithm that outputs the solutions to such that:

  • the delay of is bound by at most times and bound by otherwise;

  • every result is produced at most times.

Then, there exists an enumeration algorithm for , with preprocessing time and delay.

Proof.

simulates and maintains a lookup table and a queue that are initialized as empty. When returns a result, checks the lookup table to determine whether it was found before. If it was not, the result is added to both the lookup table and the queue. first performs computation steps, and then after every computation steps, it outputs a result from the queue. returns its th result after computation steps. At this time, produced at least results, which form at least unique results, so the queue is never empty when accessed. When it is done simulating , outputs all remaining results in the queue. outputs all results of with no duplicates since every result enters the queue exactly once. ∎

A direct consequence of Lemma 3.2 is that to show that a problem is in , it suffices to find an algorithm for this problem where the delay is usually constant, but it may be linear a constant number of times, and the number of times every result is produced is bound by a constant.

As Example 1.1 shows, Theorem 3.1 does not cover all tractable UCQs. We now address the other cases. We start with some definitions. We define body-homomorphisms between CQs to have the standard meaning of homomorphism, but without the restriction on the heads of the queries.

Definition 3.3 ().

Let be CQs.

  • A body-homomorphism from to is a mapping such that for every atom of , .

  • If are self-join free and there is a body-homomorphism from to and vice versa, we say that and are body-isomorphic, and is called a body-isomorphism.

We now formalize the way that one CQ can help another CQ in the union during evaluation by providing some variables.

Definition 3.4 ().

Let be CQs. We say that provides a set of variables to if:

  1. There is a body-homomorphism from to .

  2. There is such that .

  3. There is such that is -connex.

The following lemma shows why provided variables play an important role for UCQ enumeration: If provides a set of variables to , then we can produce an auxiliary relation for containing all possible value combinations of these variables. This can be done efficiently while producing some answers to .

Lemma 3.5 ().

Let be CQs such that provides to . Given an instance , one can compute with linear time preprocessing and constant delay a set of mappings , which can be translated to in time .

Proof.

Let be a body-homomorphism, and let and be sets of variables meeting the conditions of Definition 3.4. Take an ext--connex tree for , and perform the CDY algorithm on while treating as the free-variables. This results in a set of mappings from the variables of to the domain such that .

For every mapping , extend it once to obtain a mapping from all variables of as follows. Go over all vertices of starting from the connected part containing and treating a neighbor of an already treated vertex at every step. Consider a step where in its beginning is a homomorphism from a set , and we are treating an atom where and . We take some tuple in of the form and extend to also map . Such a tuple exists since the CDY algorithm has a preprocessing step that removes dangling tuples. This extension takes constant time, and in its end we have that . These extensions form . When computing , the delay for the first element may be linear due to the preprocessing phase of the CDY algorithm, but the delay after that is constant.

We now describe how can be translated to . As , we simply need to use the body-homomorphism in the opposite direction. For every variable , define to be . Given a mapping , if is the same for all , denote it by . Otherwise, is undefined, and in the following is skipped. Since is a body-homomorphism, we have that . Given , we can compute (or determine that it is undefined) in constant time. Doing this for every , we can compute in time . ∎

During evaluation, a set of variables provided to a CQ can be used as an auxiliary relation, accessible by an auxiliary atom. The CQ along with its auxiliary atoms is called a union extension.

Definition 3.6 ().

Let be a UCQ.

  • A union extension of is

    where , each with is provided by some , and are fresh relational symbols. By way of recursion, the variables may alternatively be provided by a union extension of some .

  • Atoms appearing in but not in are called virtual atoms.

Union extension can transform an intractable query to a free-connex one.

Definition 3.7 ().

Let be a UCQ.

  • is said to be union-free-connex with respect to if it has a free-connex union extension.

  • is free-connex if all CQs in are union-free-connex.

Note that the term free-connex for UCQs is a generalization of that for CQs: If a UCQ contains only one CQ, then is free-connex iff the CQ it contains is free-connex. We next show that tractability of free-connex queries also carries over to UCQs.

Theorem 3.8 ().

Let be a UCQ. If is free-connex, then .

Proof.

For each query in the union (in an order imposed by the recursive definition of union extensions), we first instantiate its free-connex union extension , and then evaluate the resulting free-connex CQ using the CDY algorithm: For every virtual atom containing some variables , use Lemma 3.5 to generate a subset of while obtaining a relation assigned to this atom. After instantiating all virtual relations, we have an instance for , and we can evaluate it as usual using the CDY algorithm. We have that since all virtual atoms in are assigned relations that contain merely a projection of the results.

Overall, there is a constant number of times where the delay is linear: once per query and once per virtual atom. Similarly, every result is produced at most a constant number of times: once per query and once per virtual atom. According to Lemma 3.2 this means that . ∎

We can now revisit Example 1.1 and explain its tractability using the terminology and results from this section. There is a body-homomorphism from to with . The query provides to , as , and is -connex. Thus, we can add to , and the union extension is free-connex (see Figure 2). Since every query in is union-free-connex, we have that by Theorem 3.8.

:
Figure 2. -connex tree for and -connex tree for .
Remark 1.

Example 1.1 is a counter example to a past made claim (Berkholz et al., 2018, Theorem 4.2b). The claim is that if a UCQ contains an intractable CQ and does not contain redundant CQs (a CQ contained in another CQ in the union), then the union is intractable. In contrast, none of the CQs in Example 1.1 is redundant, is intractable, and yet the UCQ is tractable.

The intuition behind the proof of the past claim is reducing the hard CQ to . This can be done by assigning each variable of with a different and disjoint domain (e.g., by concatenating the variable names to the values in the relations corresponding to the atoms), and leaving the relations that do not appear in the atoms of empty. It is well known that iff there exists a homomorphism from to . The claim is that since there is no homomorphism from another CQ in the union to , then there are no answers to the other CQs with this reduction. However, it is possible that there is a body-homomorphism from another CQ to even if it is not a full homomorphism (the free variables do not map to each other). Therefore, in cases of a body-homomorphism, the reduction from the to does not work. In such cases, the union may be tractable, as we show in Theorem 3.8. In Lemma 4, we use the same proof described here, but restrict it to UCQs where there is no body-homomorphism from other CQs to . ∎

The tractability result in Theorem 3.8 is based on the structure of the union-extended queries. This means that the intractability of any query within a UCQ can be resolved as long as another query can provide the right variables. The following example shows that this can even be the case for a UCQ only consisting of non-free-connex CQs. Moreover, the example illustrates why we need the definition of union extensions to be recursive.

Example 3.9 ().

Let with

Each of three CQs is intractable on its own: has the free-path , while has the free-path , and has the free-path . The CQ provides the variables to , as it is -connex, , and there is a body-homomorphism from to with . Extending the body of by the virtual atom yields the free-connex union extension . Similarly, we have that provides to , and extending by yields the free-connex union extension . Since and provide and respectively to , we obtain a free-connex union extension by adding virtual atoms with the variables and to . Thus, is free-connex and can be enumerated efficiently by Theorem 3.8. ∎

Remark 2.

The approach presented here can also be used when there are functional dependencies in the schema. By taking functional dependencies into account, we can find even more tractable cases. If there are functional dependencies, some intractable CQs have a tractable FD-extension that can be computed efficiently (Carmeli and Kröll, 2018). Given a UCQ over a schema with functional dependencies, we can first take the FD-extensions of all CQs in the union, and then take the union extensions of those and evaluate the union. ∎

4. Lower Bounds

In this section, we prove lower bounds for evaluating UCQs within the time bounds of . We begin with some general observations regarding cases where a single CQ is not harder than a union containing it, and then continue to handle other cases. In Section 4.1 we discuss unions containing only intractable CQs, and in Section 4.2 we discuss unions containing two body-isomorphic CQs. In both cases such UCQs may be tractable, and in case of such a union of size two, we show that our results from Section 3 capture all tractable unions.

In order to provide some intuition for the choices we make throughout this section, we first explain where the approach used for proving the hardness of single CQs fails. Consider Example 1.1. The original proof that shows that is hard describes a reduction from Boolean matrix multiplication (Bagan et al., 2007, Lemma 26). Let and be binary representations of Boolean matrices, i.e. corresponds to a in the first matrix at index . Define a database instance as , , and . One can show that corresponds to the answers of . If , we can solve matrix multiplication in time , in contradiction to mat-mul. Since evaluates over the same relations, also produces answers over this construction. Since the number of results for might reach up to , evaluating in constant delay does not necessarily compute the answers to in time, and does not contradict the complexity assumption. In general, whenever we show a lower bound to a UCQ by computing a hard problem through answering one CQ in the union, we need to ensure that the other CQs cannot have too many answers over this construction.

As a first observation, we describe cases where there is a way to encode any arbitrary instance of to an instance of , such that no other CQ in the union returns results.

Let be a UCQ of self-join free CQs, and let such that for all there is no body homomorphism from to . Then . Given an instance of , we assign each variable of with a different and disjoint domain by concatenating the variable names to the values in their corresponding relations. We leave the relations that do not appear in the atoms of empty. Since there is no body-homomorphism from to , then there are now no answers to over this construction, and the answers to are exactly those of .

Proof.

Let . Given an instance of , the construction of assigns every variable of with a different domain. More formally, for every and every tuple we have the tuple in . The relations that do not appear in are left empty. We claim that the results of over the original instance are exactly those of over our construction if we omit the variable names. That is, we define as , and show that .

We first prove that . That is, we show that the results obtained due to the evaluation of in both cases are the same. If , then for every atom in , . By construction, . By defining as , we have . Since , we have that , and this concludes that . The opposite direction is trivial: if , then for every atom in , . By construction, , and therefore .

We now know that . It is left to show that . Assume by contradiction that there exists such that . Since , there exists some such that . Since and , we know that , and therefore . Define as . Since , we know that for every atom in , . By construction, if then is an atom in . Consider . For every atom in , is an atom in . This means that there is a body-homomorphism from to , and achieves a contradiction. ∎

Lemma 4 implies that if there is an intractable CQ in a union where no other CQ maps to it via a body-homomorphism, then the entire union is intractable. This also captures cases such as a union of CQs where one of them is hard, and the others contain a relation that does not appear in the first.

Using the same reduction, a similar statement with relaxed requirements can be made in case it is sufficient to consider the decision problem.

Let be a UCQ of self-join free CQs, and let such that for all either there is no body-homomorphism from to or and are body-isomorphic. Then, via a linear-time many-one reduction. We use the same encoding as in Lemma 4. As before, a CQ with no body-homomorphism to has no answers. A CQ which is body-isomorphic to has an answer iff has an answer. Therefore iff .

Proof.

We still need to proof the claim that for body-isomorphic CQs and and database , iff . First assume that , and let be a body-homomorphism from to . For the homomorphism with , we have that : For every atom , we have and thus . The other direction can be proven analogously. ∎

Theorem 2.1 states that deciding whether a cyclic CQ has any answers cannot be done in linear time (assuming hyperclique). Following Lemma 4, if a UCQ containing a cyclic where the conditions of Lemma 4 are satisfied, the entire union cannot be decided in linear time, and thus .

4.1. Unions of Intractable CQs

We now discuss unions containing only CQs classified as hard according to Theorem 2.1. In the following, intractable CQs refers to self-join-free CQs that are not free-connex. The following lemma can be used to identify a CQ on which we can apply Lemma 4 or Lemma 4.

Let be a UCQ. There exists a query such that for all either there is no body-homomorphism from to or and are body-isomorphic. Consider a longest sequence such that for every there is a body-homomorphism from to , but no body-homomorphism in the opposite direction. The CQ satisfies the conditions of the lemma: For every not on the sequence, if there is a body homomorphism from to , then there is also one in the opposite direction due to the maximality of the sequence; For every on the sequence, there is a body-homomorphism from to , so either there is no body-homomorphism in the opposite direction, or the CQs are body-isomorphic.

Proof.

Consider a longest sequence such that for every there is a body-homomorphism from to denoted , but no body-homomorphism in the opposite direction. Note that it is not possible that the same query appears twice in the sequence: if where , then there is a mapping from to , in contradiction to the definition of the sequence. Therefore, , and such a longest sequence exists. We claim that satisfies the conditions of the lemma. First consider some