    # Subsumption of Weakly Well-Designed SPARQL Patterns is Undecidable

Weakly well-designed SPARQL patterns is a recent generalisation of well-designed patterns, which preserve good computational properties but also capture almost all patterns that appear in practice. Subsumption is one of static analysis problems for SPARQL, along with equivalence and containment. In this paper we show that subsumption is undecidable for weakly well-designed patterns, which is in stark contrast to well-designed patterns, and to equivalence and containment.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 SPARQL Patterns

We adopt the formalisation of SPARQL that mostly follows . However, we concentrate on patterns constructed using only basic graph patterns and optional matching.

RDF Graphs   An RDF graph is a labelled graph where nodes can also serve as edge labels. Formally, let be a set of IRIs. Then an RDF triple is a tuple from , where is called subject, predicate, and object. An RDF graph is a finite set of RDF triples.

SPARQL Syntax   Let be an infinite set of variables, disjoint from . A basic (graph) pattern is a possibly empty set of triples from

 (I∪X)×(I∪X)×(I∪X).

An (optional SPARQL graph) patterns are defined by the following grammar, where ranges over basic patterns:

 P ::=B∣(POPTP).

We denote the set of all variables that appear in a pattern .

Note that a given pattern can occur more than once within a larger pattern. In what follows we will need to distinguish between a (sub-)pattern as a possibly repeated building block of another pattern and its occurrences in —that is, unique subtrees in the parse tree. Then, the left (right) argument of an occurrence is the subtree rooted in the left (right) child of the root of in the parse tree, and an occurrence is inside an occurrence if the root of is a descendant of the root of .

A pattern is well-designed (Pérez et al. ) if for every occurrence of an -pattern in the variables from occur in only inside .

Given a pattern , an occurrence in dominates an occurrence if there exists an occurrence of an -pattern such that is inside the left argument of and is inside the right argument. A pattern is weakly well-designed ([5, 6]) if, for each occurrence of an -subpattern , the variables in appear outside only in subpatterns whose occurrences are dominated by .

SPARQL Semantics   The semantics of graph patterns is defined in terms of mappings—that is, partial functions from variables to IRIs. The domain of a mapping is the set of variables on which is defined. Two mappings and are compatible, written , if for all variables . Mapping is subsumed by mapping , written , if and . If , then constitutes a mapping with domain that coincides with on and with on .

Given two sets of mappings and , we define their left outer join operation as follows:

 Ω1to0.0pt⋈Ω2={μ1∪μ2∣μ1∈Ω1,μ2∈Ω2, and μ1∼μ2}∪{μ1∣μ1∈Ω1,μ1≁μ2 % for all μ2∈Ω2}.

Given a graph , the evaluation of a pattern over is defined as follows:

1. if is a basic pattern, then

2. .

A pattern is contained in a pattern if for every graph . Patterns and are equivalent if they contain each other. Pattern is subsumed by , written , if, for every graph , each has such that (Letelier et al. ).

## 2 Pattern Subsumption

###### Theorem 1

The problem of checking whether for weakly well-designed patterns and is undecidable.

Proof.  We prove undecidability by a reduction of a variant of the tiling problem, which is known to be undecidable (see e.g., ). We start by introducing the notation used throughout the proof.

A tiling instance consists of a collection of tile types and edge compatibility relations and  on . Intuitively, means that a tile of type can be placed to the right of a tile of type in a row, while means that can be placed above in a column.

A tiling of the positive plane with is a function , for the set of natural numbers , such that, for all ,

• , and

• .

Tiling is periodic if there exist positive numbers and , called horizontal and vertical periods, respectively, such that for all . A periodic tiling can be seen as a tiling of a torus, since column and row can be “glued” with the left-most column and bottom row, respectively.

Let denote the set of all tiling instances that allow for tilings of the positive plane, and the set of all tiling instances that allow for periodic tilings. To prove undecidability we will use the following fact.

###### Fact 1 (Gurevich and Koryakov )

Sets and are recursively inseparable—that is, there is no recursive set with .

In what follows we first construct, for each tiling instance , weakly well-designed patterns and , and then show that the set

 Φ={(PT,P′T)∣PT⋢P′T}

contains , and is contained in . This will imply, by Fact 1, that (and, hence, the complement of ) cannot be recursive.

Let be a tiling instance with tile types , and compatibility relations and . Let be

 {(c11,hType,inInitRow),(c11,cType,Cell),(c11,hNext,c12),(c11,vNext,c21),(c12,vNext,c22),(?b,bType,Base⊑)   };

so, is a basic pattern with 6 triples, only one of which mentions a variable, . The other pattern has a more complex structure: let be

 \omit\span\omit\span\omit\span\omit\span\omit\span\omit\span\omit(⋯((⋯((⋯(BrootOPTB1h-incompat)OPT⋯OPTBℓh-incompat)OPT B1v-incompat) OPT ⋯ OPT Bmv-incompat) OPT  B1tiling)  OPT  ⋯  OPT  Bntiling)  OPT   Bbase, (1)

where , ,

Having the construction complete, next we show that for any tiling instance in . In particular, on the base of a witnessing periodic tiling we build a graph and a mapping such that , but there is no such that . Assume that has tile types , compatibility relations and , and periodic tiling with the horizontal and vertical periods and , respectively. Let consist of the triples

 (b⊑,tType,Base⊑),(b⋢,tType,Base⋢),

as well as the triples

 (c1j,hType,inInitRow), for each% j=1,…,q,(cij,cType,Cell), for each i=1,…,p and% j=1,…,q,(cij,tType,τ(i,j)), for each i=1,…,p and j=1,…,q,(cij,hNext,ci(j+1)), for each i=1,…,p and j=1,…,q−1,(ciq,hNext,ci1), for each i=1,…,p,(cij,vNext,c(i+1)j), for each i=1,…,p−1 and % j=1,…,q,(cpj,vNext,c1j), for each j=1,…,q.

Let also .

It is immediate to see that . Moreover, assuming that has form (1), consists of mappings sending to one of , to one of , also to one of , while , and to the IRIs accordingly connected to the value of (note that the values of , , and do not depend on each other).

Since the tiling agrees with and , none of basic patterns and has a match in , because each of them requires a pair of horizontally or vertically adjacent cells with incompatible tile types. So, none of the mappings in are extendable to any of and . However, each mapping extends to such that with . In particular, this extension sends to , which implies that . Therefore, and are a witness for the required .

We continue by showing that implies for any tiling instance . In particular, on the base of a graph and mapping witnessing we construct a tiling of the positive plane with . Assume that has tile types as well as compatibility relations and . Since , graph contains triples

 (c11,hType,inInitRow),(c11,cType,Cell),(c11,hNext,c12),(c11,vNext,c21),(c12,vNext,c22),(b⊑,bType,Base⊑)

for the IRI such that . Therefore, assuming that has form (1), contains a mapping sending to . Mapping is extendable to for some ; indeed, if it is not the case, then contains an extension of sending to , because all , , and contain , while matches , which implies contradicting the fact that and are a witness for non-subsumption. Therefore, triples are matched in extending , that is, contains triples

 (c11,cType,Cell),(c11,hNext,c′12),(c′12,hType,inInitRow)

for some IRI . Just for uniformity, assume that . Therefore, contains a mapping sending to (and all other variables same as ). Reasoning in the same way as for , we obtain that has triples

 (c12,cType,Cell),(c12,hNext,c13),(c13,hType,inInitRow)

for some IRI . Continuing like this, we conclude that contains

 (c1j,cType,Cell),(c1(j+1),hNext,c1(j+1))

for all (note that many of these coincide, because is finite).

For each , contains a mapping sending to . As before, this mapping is extendable in to for some . In particular, it is extendable to the triples , , and —that is, contains triples

 (c1j,tType,tk),(c1j,vNext,c2j),(c2j,cType,Cell)

for some IRI (again, if is 1 or 2, then we assume that is the same as in for uniformity). Similarly as before, contains a mapping sending to , from which we have that has triples

 (c2j,tType,tk),(c2j,vNext,c3j),(c3j,cType,Cell)

for some and . Repeating this process, we conclude that contains, for any and ,

 (cij,tType,tij),(cij,vNext,c(i+1)j),(c(i+1)j,cType,Cell)

for some and . Set for each and .

We need to show that is indeed a tiling with . To this end, we first note that contains the triple for all and : we already showed this fact for , and for all other it can be proved very similarly to the reasoning above, based on the fact that contains a mapping sending , , , and to , , , and , respectively. Now, to see that is a tiling with we just note that if there exist horizontally or vertically adjacent tiles that do not agree with or , then there exists or such that or is matched in ; since this basic patterns does not have any variables in common with , any mapping in is then extendable to this BGP and hence contains a mapping sending to , contradicting the fact that graph and mapping are a witness for non-subsumption.

## References

•  Richard Cyganiak, David Wood, and Markus Lanthaler. RDF 1.1 concepts and abstract syntax. W3C recommendation, W3C, February 2014.
•  Yuri Sh. Gurevich and I. O. Koryakov. Remarks on Berger’s paper on the domino problem. Siberian Mathematical Journal, 13(2):319–321, 1972.
•  Steve Harris and Andy Seaborne. SPARQL 1.1 query language. W3C recommendation, W3C, March 2013.
•  Patrick J. Hayes and Peter F. Patel-Schneider. RDF 1.1 semantics. W3C recommendation, W3C, February 2014.
•  Mark Kaminski and Egor V. Kostylev. Beyond well-designed SPARQL. In Wim Martens and Thomas Zeume, editors, Proc. 19th International Conference on Database Theory, ICDT 2016, volume 48 of LIPIcs, pages 5:1–5:18. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2016.
•  Mark Kaminski and Egor V. Kostylev. Complexity and expressive power of weakly well-designed SPARQL. Theory of Computing Systems (ToCS), 62(4):772–809, 2018.
•  Andrés Letelier, Jorge Pérez, Reinhard Pichler, and Sebastian Skritek. Static analysis and optimization of semantic web queries. ACM Trans. Database Syst., 38(4:25), 2013.
•  Jorge Pérez, Marcelo Arenas, and Claudio Gutierrez. Semantics and complexity of SPARQL. ACM Trans. Database Syst., 34(3):16:1–16:45, 2009.
•  François Picalausa and Stijn Vansummeren. What are real SPARQL queries like? In Roberto De Virgilio, Fausto Giunchiglia, and Letizia Tanca, editors, Proc. 3rd International Workshop on Semantic Web Information Management, SWIM 2011, pages 7:1–7:6. ACM, 2011.
•  Reinhard Pichler and Sebastian Skritek. Containment and equivalence of well-designed SPARQL. In Richard Hull and Martin Grohe, editors, Proc. 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2014, pages 39–50. ACM, 2014.
•  Eric Prud’hommeaux and Andy Seaborne. SPARQL query language for RDF. W3C recommendation, W3C, January 2008.
•  Michael Schmidt, Michael Meier, and Georg Lausen. Foundations of SPARQL query optimization. In Luc Segoufin, editor, Proc. 13th International Conference on Database Theory, ICDT 2010, pages 4–33. ACM, 2010.
•  Xiaowang Zhang, Jan Van den Bussche, and François Picalausa. On the satisfiability problem for SPARQL patterns. J. Artif. Intell. Res. (JAIR), 56:403–428, 2016.