1 Introduction
The Resource Description Framework (RDF) [20] is the W3C standard for representing linked data on the Web. In this model, data is represented as RDF graphs, which consist of collections of triples of internationalised resource identifiers (IRIs). Intuitively, such a triple represents the fact that a subject is connected to an object via a predicate .
SPARQL [26] is the standard query language for RDF graphs. In a seminal paper, Pérez et al. [23] (see also [22]) gave a clean formalisation of the language, which laid the foundations for its theoretical study. Since then, a lot of work has been done in different aspects of the language such as query evaluation [19, 3, 15, 4, 16], optimisation [17, 24, 14], and expressive power [2, 25, 15, 30, 11], to name a few.
As shown in [23], it is PSPACEcomplete to evaluate SPARQL queries. This motivated the introduction of a natural fragment of SPARQL called the welldesigned fragment, whose evaluation problem is coNPcomplete [23]. More formally, the evaluation problem wdEVAL for welldesigned SPARQL is to decide, given a welldesigned query , and RDF graph and a mapping , whether belongs to the answer of over . By now the welldesigned fragment is central in the study of SPARQL and a lot of efforts has been done by the theory community to understand fundamental aspects of this fragment (see e.g. [23, 17, 24, 4, 14, 11, 16, 15]). In this paper, we focus on the core fragment of welldesigned SPARQL restricted to the AND, OPTIONAL and UNION operators, as defined in [23].
Despite its importance, several basic questions remain open for welldesigned SPARQL. As first observed in [17], while the problem wdEVAL is coNPcomplete, it becomes tractable, i.e. polynomialtime solvable, for restricted classes of welldesigned queries. Indeed, it was shown that wdEVAL is in PTIME for every class of queries satisfying a certain local tractability condition [23]. We emphasise that the abovementioned result is briefly discussed in [23] as the focus of the authors is on the static analysis and optimisation of queries rather than complexity of evaluation. Subsequent works [4, 16] have studied the complexity of evaluation in more depth but the focus has been mainly on the fragment of SPARQL including the SELECT operator (i.e., projection). In particular, the following fundamental question regarding the core welldesigned fragment remains open: which classes of welldesigned SPARQL can be evaluated in polynomial time?
Our main contribution is a complete answer to the question posed above. In particular, we introduce a new width measure for welldesigned queries called domination width, which is based on the wellknown notion of treewidth (see Section 3 for precise definitions). For a class of welldesigned queries, let us denote by the evaluation problem wdEVAL restricted to the class . Also, we say that a class of welldesigned queries has bounded domination width if there is an universal constant such that the domination width of every query in is at most . Then, our main technical result is as follows (Theorem 3). Assume that FPT W[1]. Then, for every recursively enumerable class of welldesigned queries, the problem is in PTIME if and only if has bounded domination width. The assumption FPT W[1] is a widely believed assumption from parameterised complexity (see Section 4 for precise definitions). As we observe in Section 3, one can remove the assumption of being recursively enumerable by considering a stronger assumption than FPT W[1] considering nonuniform complexity classes.
Our result builds on the classical result by Dalmau et al. [6] and Grohe [9] showing that a recursively enumerable class of conjunctive queries (CQs) over schemas of bounded arity is tractable if and only if the cores of the CQs in have bounded treewidth. (Recall that a CQ is a firstorder query using only conjunctions and existential quantification.)
For the tractability part of our result, we exploit, as in [6], the socalled existential pebble game introduced in [12] (see also [6]). This game provides a polynomialtime relaxation for the problem of checking the existence of homomorphisms, which is a wellknown NPcomplete problem (see e.g. [5]). Using the existential pebble game, we define a natural relaxation of the standard algorithm from [17] (see also [24]) for evaluating welldesigned queries. Then we show that this relaxation correctly solves instances of bounded domination width (Theorem 1).
For the hardness part, we follow a similar strategy as in [9]. The two main ingredients in our proof is an adaptation of the main construction of [9] to handle distinguished elements or constants (Lemma 2) and an elementary property of welldesigned queries of large domination width (Lemma 3).
Finally, we emphasise that our classes of bounded domination width significantly extend the classes that are locally tractable [17], which, as we mentioned above, are the most general tractable restrictions known so far. This is even true in the case of UNIONfree welldesigned queries. As we discuss in Section 3.2, the notion of domination width for UNIONfree queries can be simplified and coincides with a width measure called branch treewidth. Bounding this simpler width measure still strictly generalises local tractability.
2 Preliminaries
RDF Graphs. Let be a countable infinite set of IRIs. An RDF triple is a tuple in and an RDF graph is a finite set of RDF triples. In this paper, we assume that no blank nodes appear in RDF graphs, i.e., we focus on ground RDF graphs.
SPARQL Syntax. SPARQL [26] is the standard query language for RDF. We rely on the formalisation proposed in [23]. We focus on the core fragment of the language given by the operators AND, OPTIONAL (OPT for short), and UNION.^{1}^{1}1Additional operators include FILTER and SELECT. We briefly discuss these operators in Section 5. Let be a countable infinite set of variables, disjoint from . A SPARQL triple pattern (or triple pattern for short) is a tuple in . The set of variables from appearing in a triple pattern is denoted by . Note that an RDF triple is simply a SPARQL triple pattern with . A SPARQL graph pattern (or graph pattern for short) is recursively defined as follows:

a triple pattern is a graph pattern, and

if and are graph patterns, then is also a graph pattern, for .
SPARQL Semantics. In order to define the semantics of graph patterns, we follow again the presentation in [23]. A mapping is a partial function from to . We denote by the domain of the mapping . Two mappings and are compatible if , for all . If and are compatible mappings then denotes the mapping with domain such that , for all , and , for all . For a triple pattern and a mapping such that , we denote by the RDF triple obtained from by replacing each by .
For an RDF graph and a graph pattern , the evaluation of over is a set of mappings defined recursively as follows:

, if is a triple pattern.

, and are compatible.

and there is no compatible with .

.
Welldesigned SPARQL. A central class of SPARQL graph patterns identified in [23], and also the focus of this paper, is the class of welldesigned graph patterns. We say that a graph pattern is UNIONfree if it only uses the operators AND and OPT. A UNIONfree graph pattern is welldesigned if for every subpattern of , it is the case that every variable ocurring in but not in , does not occur outside in . A SPARQL graph pattern is welldesigned if it is of the form , where each is a UNIONfree welldesigned graph pattern.^{2}^{2}2This toplevel use of the UNION operator is known as UNIONnormal form [23]. Note that we are implicitly using the fact that UNION is associative.
Example 1
Consider the following graph patterns:
Note that is welldesigned, while is not. Indeed, in the subpattern of , the variable appears in and not in but does occur outside in .
Welldesigned patterns have good properties in terms of query evaluation. More precisely, let wdEVAL be the problem of deciding, given a welldesigned graph pattern , an RDF graph and a mapping , whether . It was shown in [23] that wdEVAL is coNPcomplete, while the problem is PSPACEcomplete for arbitrary SPARQL graph patterns.
2.1 Pattern trees and pattern forests
Besides alleviating the cost of evaluation, another key property of UNIONfree welldesigned graph patterns is that they can be written in the socalled OPTnormal form [23]. In turn, patterns in OPTnormal form admit a natural tree representation, known as pattern trees [17]. Intuitively, a pattern tree is a rooted tree where each node represents a welldesigned pattern using only AND operators, while its tree structure represents the nesting of OPT operators. Consequently, a welldesigned graph pattern UNION can be represented as a pattern forest^{3}^{3}3In this paper, we work with a particular type of patterns trees/forests, namely welldesigned pattern trees/forests. For simplicity, sometimes we abuse notation and use the terms patterns trees/forests and welldesigned pattern trees/forests interchangeably.[24], i.e., a set of pattern trees , where is the pattern tree representation of . Pattern trees/forests are useful for understanding how to evaluate and optimise welldesigned patterns, and have been used extensively as a basic tool in the study of welldesigned SPARQL (see e.g. [17, 24, 4, 14, 11, 16]). As we show in this work, pattern forests are also fundamental to understand tractable evaluation of welldesigned SPARQL: by imposing restrictions on the pattern forest representation, we can identify and characterise the tractable classes of welldesigned graph patterns.
Tgraphs and homomorphisms. A triple pattern graph (or tgraph for short) is a finite set of triple patterns. We denote by the set of variables from appearing in the tgraph . Note that an RDF graph is simply a tgraph with . Let be a triple pattern and be a partial function from to such that . We define to be the triple pattern obtained from by replacing each by . For two tgraphs and , we say that a partial function from to is a homomorphism from to if and for every , it is the case that .
Basics of pattern trees and forests. For an undirected graph , we denote by its set of nodes. A welldesigned pattern tree (or wdPT for short) is a triple such that

is a tree rooted at a node ,

is a function that maps each node to a tgraph, and

the set induces a connected subgraph of , for every .
Let be a wdPT. A wdPT is a subtree of if (i) is a subtree of , (ii) , and , for all . Note that any subtree of contains the original root . A child of the subtree is a node such that , where is the parent of in .
For convenience, we fix two functions pat() and vars() as follows. Let be a wdPT. We define , for every and . Note that and are tgraphs. We let , for and .
A welldesigned pattern forest (wdPF for short) is a finite set of welldesigned pattern trees.
In [17], it was shown that every wdPT can be translated efficiently into an equivalent wdPT in the socalled NR normal form. A wdPT is in NR normal form if for every node with parent in , it holds that . In this paper, we assume that all wdPTs are in NR normal form.
Welldesigned SPARQL and wdPFs. As in the case of SPARQL graph patterns, we denote by (resp., ) the evaluation of a wdPT (resp., wdPF ) over an RDF graph . In [17], for a wdPT , the set of mappings is defined via a translation to welldesigned graph patterns. However, if is in NRnormal form, then admits a simple characterisation stated in Lemma 1 below. In this paper, we adopt this characterisation as the semantics of wdPTs.
Lemma 1 ([17, 24])
Let be a wdPT in NR normal form, an RDF graph and a mapping. Then iff there exists a subtree of such that

is a homomorphism from to .

there is no child of and homomorphism from to compatible with .
For a wdPF and an RDF graph , we define .
As shown in [17], every UNIONfree welldesigned graph pattern can be translated in polynomial time into an equivalent wdPT , i.e., a wdPT such that , for all RDF graphs . Consequently and as observed in [24], every welldesigned graph pattern can be translated in polynomial time into an equivalent wdPF . Throughout the paper, we fix a polynomialtime computable function that maps each welldesigned graph pattern to an equivalent wdPF.
2.2 Restrictions of the evaluation problem
Recall that wdEVAL denotes the problem of deciding, given a welldesigned graph pattern , an RDF graph and a mapping , whether . In this paper, we study restrictions of wdEVAL given by different classes of admissible patterns. Formally, for a class of welldesigned graph patterns, we define the problem as follows:
Input: a welldesigned graph pattern , 
an RDF graph and a mapping . 
Question: does hold? 
Note that is a promise problem, as we are given the promise that . This allows us to analyse the complexity of evaluating patterns in independently of the cost of checking membership in .
3 A new tractability condition
In this section, we introduce the notion of domination width of a welldesigned graph pattern and show our main tractability result: is in PTIME, for classes of graph patterns of bounded domination width. Before doing so, we need to introduce some terminology.
A generalised tgraph is a pair , where is a tgraph and . Consider two generalised tgraphs of the form and . A homomorphism from to is a homomorphism from to such that , for all . We write whenever there is a homomorphism from to ; otherwise, we write . Note that the relation is transitive, i.e., and implies .
Let be a generalised tgraph, be an RDF graph and be a mapping with . We write if there is a homomorphism from to such that , for all . Notice that composes with , i.e., and implies .
Below we state several notions and properties for generalised tgraphs. We emphasise that all these properties are wellknown for conjunctive queries (CQs) and relational structures and can be applied in our case as there is a strong correspondence between generalised tgraphs and CQs. Indeed, we can view a generalised tgraph as a CQ over a relational schema containing a single ternary relation, where the variables are , the free variables are , and the IRIs appearing in correspond to constants in . However, for convenience and consistency with RDF and SPARQL terminology, we shall work directly with generalised tgraphs throughout the paper.
Cores. Let and be two generalised tgraphs. We say that is a subgraph of if , and a proper subgraph if but . A generalised tgraph is a core if there is no homomorphism from to one of its proper subgraphs . We say that is a core of if is a core itself, and . As stated below, every generalised tgraph has a unique core (up to renaming of variables), and hence, we can speak of the core of a generalised tgraph.
Proposition 1 (see e.g. [1, 10])
Every generalised tgraph has a unique core (up to renaming of variables).
Treewidth. The notion of treewidth is a wellknown measure of the treelikeness of an undirected graph (see e.g. [7]). For instance, trees have treewidth , cycles treewidth and , the clique of size , treewidth . Let be an undirected graph. A tree decomposition of is a pair where is a tree and is a function that maps each node to a subset of such that

for every , the set induces a connected subgraph of , and

for every edge , there is a node with .
The width of the decomposition is . The treewidth of the graph is the minimum width over all its tree decompositions.
Let be a generalised tgraph. The Gaifman graph of is the undirected graph whose vertex set is and whose edge set contains the pairs such that and , for some triple pattern . We define the treewidth of to be . If has no vertices, i.e., , or has no edges, we let .
For a generalised tgraph , we let , where is the core of .
Example 3
Let and consider the generalised tgraphs and depicted in Figure 1, where and is the tgraph given by the set
Observe that is a core and hence , as its Gaifman graph is the clique of size . On the other hand, the core of is , where
Hence, while .
Existential pebble game. The existential pebble game was introduced by Kolaitis and Vardi [12] to analyse the expressive power of certain Datalog programs. While the original definition deals with relational structures, here we focus on the natural adaptation to the context of generalised tgraphs and RDF graphs.
Let . The existential pebble game is played by the Spoiler and the Duplicator on a generalised tgraph , an RDF graph and a mapping with . During the game, the Spoiler only picks elements from , while the Duplicator picks elements from , where is the set of IRIs appearing in . In the first round, the Spoiler places pebbles on (not necessarily distinct) elements , and the Duplicator responds by placing pebbles on elements . On any further round, the Spoiler removes a pebble and places it on another element . The Duplicator responds by moving the corresponding pebble to an element . If after a particular round, the elements covered by the pebbles are and for the Spoiler and the Duplicator, respectively, then the configuration of the game is if and , for some with ; otherwise, it is the mapping , where and , for every (note that ).
The Duplicator wins the game if he has a winning strategy, that is, he can indefinitely continue playing the game in such a way that the configuration at the end of each round is a mapping that is a partial homomorphism, i.e., for every triple pattern with , it is the case that . If the Duplicator can win the existential pebble game on , and , then we write .
Note that if , then for every ,
(1) 
i.e., is a homomorphism from to . Observe also that for every ,
(2) 
In other words, the relation is a relaxation of . As we state below, the relaxation given by has good properties in terms of complexity^{4}^{4}4The existential pebble game is known to capture the socalled consistency test [13]
, which is a wellknown heuristic for solving
constraint satisfaction problems (CSPs).: while checking the existence of homomorphisms, i.e., is a wellknown NPcomplete problem [5], checking can be done in polynomial time, for every fixed .Proposition 2 ([12]; see also [6])
Let . For a given generalised tgraph , an RDF graph and a mapping with , checking whether can be done in polynomial time.
As it turns out, there is a strong connection between existential pebble games and the notion of treewidth. In particular, it was shown by Dalmau et al. [6] that the relations and coincide for generalised tgraphs satisfying ^{5}^{5}5In [6], it was shown that and coincide for relational structures whose cores have treewidth at most . For Proposition 3, we need a generalisation of the results in [6] that considers relational structures equipped with a set of distinguished elements. Indeed, such distinguished elements correspond to the variables in and the IRIs appearing in the generalised tgraph . Such a generalisation follows straightforwardly from the results in [6]..
Proposition 3 ([6])
Let . Let be a generalised tgraph, be an RDF graph and be a mapping with . Suppose that . Then if and only if .
We conclude with two basic properties of the existential pebble game that will be useful for us.
Proposition 4
Let . Let , , , be generalised tgraphs , be an RDF graph and be a mapping with . Then the following hold:

if and , then it is the case that .

if , for all and , for all with , then .
3.1 Domination width
We start by giving some intuition regarding the notion of domination width. Let be a welldesigned graph pattern, be an RDF graph and be a mapping. Suppose that and , for . The natural algorithm for checking is as follows (see e.g. [17, 24]): we simply iterate over all such that is a potential solution of over , i.e., there is a subtree of such that is a homomorphism from to , and we ensure that there is a child of where can be extended consistently.
The key observation is that we can reinterpret the abovedescribed algorithm as follows. We can choose one of the subtrees as above, and associate a collection of generalised tgraphs of the form , where , where is the set of indices such that is a potential solution of over , and is a child of . To avoid conflicts, for every , the variables from that are not in , need to be renamed to fresh variables. Therefore, checking amounts to checking that there is a homomorphism from some element of to , i.e., whether , for some .
The idea behind domination width is to ensure that is always dominated by a subset where each generalised tgraph in has small ctw. The set dominates in the sense that, for every , there is a such that . Therefore, by transitivity of the relation , checking amounts to checking that there is a homomorphism from some element of to . Since generalised tgraphs of small ctw are wellbehaved with respect to the relaxation (see Proposition 3), this will imply that the relaxation of the natural algorithm, described at the beginning of this section, given by replacing homomorphism tests by , correctly decides if . Below we formalise this intuition.
Let be a wdPF. A subtree of is a subtree of some wdPT , for . The support of the subtree contains precisely the indices from such that there is a subtree of satisfying . Note that , for every subtree . Since wdPTs are in NR normal form, whenever , then the witness subtree is unique. For , we denote such a by .
Let be a subtree of . A children assignment for is a function with a nonempty domain that maps every to a child of . We denote by the set of all children assignments for . Observe that if , then it must be the case that , for every . In particular, it could be the case that . The renamed tgraphs assignment associated with maps to a tgraph obtained from by renaming all variables in to new fresh variables. In particular, if and , then
For , we define the tgraph as
We say that a children assignment is valid if for every , we have that
We denote by the set of valid children assignments for . Finally, for the subtree , we define the set of generalised tgraphs associated with as
Example 4
Let . Recall from Example 3 that
Consider the wdPF depicted in Figure 2. For a wdPT and a subset , we denote by the subtree of induced by the set of nodes . Observe that the only subtrees of with a nonempty set are , , , and . Consider first and note that . We have that
with , where and are described by and . Figure 3 illustrates and . Note how we need to rename to a fresh variable in . Observe also that, for instance, the children assignment given by is not valid as and
For , we have that
where . Note that in Figure 1 corresponds to . In the case of , we have that
where . Finally, note that and .
Now we are ready to define domination width.
Definition 1 (domination)
Let be a set of generalised tgraphs of the form , where is a set of tgraphs and is a fixed set of variables with , for all . We say that is a dominating set of if for every , there exists such that .
We say that is dominated if the set is a dominating set of .
Definition 2 (Domination width)
Let be a wdPF. The domination width of , denoted by , is the minimum positive integer such that for every subtree of , the set of generalised tgraphs is dominated.
For a welldesigned graph pattern , we define the domination width of as .
We say that a class of welldesigned graph patterns has bounded domination width if there is a universal constant such that , for every .
Example 5
The following is our main tractability result.
Theorem 1 (Main tractability)
Let be a class of welldesigned graph patterns of bounded domination width. Then is in PTIME.
Let be a positive integer such that , for all . Fix , RDF graph and mapping . Let and suppo
Comments
There are no comments yet.