DeepAI
Log In Sign Up

Evaluating Continuous Basic Graph Patterns over Dynamic Link Data Graphs

In this paper, we investigate the problem of evaluating Basic Graph Patterns (BGP, for short, a subclass of SPARQL queries) over dynamic Linked Data graphs; i.e., Linked Data graphs that are continuously updated. We consider a setting where the updates are continuously received through a stream of messages and support both insertions and deletions of triples (updates are straightforwardly handled as a combination of deletions and insertions). In this context, we propose a set of in-memory algorithms minimizing the cached data for efficiently and continuously answering main subclasses of BGP queries. The queries are typically submitted into a system and continuously result the delta answers while the update messages are processed. Consolidating all the historical delta answers, the algorithms ensure that the answer of each query is constructed at any given time.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

09/12/2022

Efficient query evaluation techniques over large amount of distributed linked data

As RDF becomes more widely established and the amount of linked data is ...
07/29/2020

How and Why is An Answer (Still) Correct? Maintaining Provenance in Dynamic Knowledge Graphs

Knowledge graphs (KGs) have increasingly become the backbone of many cri...
11/17/2017

Loom: Query-aware Partitioning of Online Graphs

As with general graph processing systems, partitioning data over a clust...
11/06/2019

Certain Answers to a SPARQL Query over a Knowledge Base (extended version)

Ontology-Mediated Query Answering (OMQA) is a well-established framework...
11/05/2022

Map matching queries on realistic input graphs under the Fréchet distance

Map matching is a common preprocessing step for analysing vehicle trajec...
06/20/2019

Extracting Basic Graph Patterns from Triple Pattern Fragment Logs

The Triple Pattern Fragment (TPF) approach is de-facto a new way to publ...
10/16/2019

On foundational aspects of RDF and SPARQL

We consider the recommendations of the World Wide Web Consortium (W3C) a...

1 Introduction

The dynamic graphs describe graphs that are continuously modified over time, usually, through a stream of edge updates (insertions and deletions). Representative examples include social graphs [2], traffic and transportation networks [18], [28], financial transaction networks [26], and sensor networks [27]. When the graph data is rapidly updated, conventional graph management systems are not able to handle high velocity data in a reasonable time. In such cases, real-time query answering becomes a major challenge.

RDF data model and Linked Data paradigm are widely used to structure and publish semantically-enhanced data. The last decades, both private and public organizations have been more and more following this approach to disseminate their data. Due to the emergence and spread of IoT, such an approach attracted further attention from both industry and research communities. To query such type of data, SPARQL is a standard query language that is widely used.

Querying streaming Linked Data has been extensively investigated in the literature, where the majority of the related work [9, 19] focus on investigating frameworks for efficiently querying streaming data; mainly focusing on defining certain operators for querying and reasoning data in sliding windows (i.e., predefined portion of the streaming data). In this work, we mainly focus on efficiently handling the continuously updated Linked Data graph (i.e., Dynamic Link Data Graphs). In particular, we consider a setting where the updates are continuously received through a stream of messages and support both insertions and deletions of triples (updates are straightforwardly handled as a combination of deletions and insertions). In this context, we propose a set of in-memory algorithms minimizing the cached data for efficiently and continuously answering main subclasses of Basic Graph Patterns (BGP, for short, a subclass of SPARQL queries). The queries are typically submitted into a system and continuously result the delta answers while the update messages are processed. The evaluation approach is based on applying an effective decomposition of the given queries in order to both improve the performance of query evaluation and minimize the cached data. Consolidating all the historical delta answers, the algorithms ensure that the answer of each query is constructed at any given time.

The paper is structured as follows. Section 2 presents the related work. The main concepts and definitions used throughout this work are formally presented in Section 3. Section 4 focus on defining main subclasses of BGP queries, while the problem, along with the relevant setting, investigated in work are formally defined in Section 5. Section 6 includes the main contributions of this work; i.e., the query answering algorithms applied over the dynamic Linked Data graphs. Finally, we conclude in Section 7.

2 Related work

The problem of querying and detecting graph patterns over streaming RDF and Linked Data has been extensively investigated in the literature [9, 19]. In this context, there have been proposed and analysed multiple settings, such as Data Stream Management Systems (DSMS - e.g., [4, 3, 17, 14, 6]), and Complex Event Processing Systems (CEP - e.g., [13, 1, 25, 22, 12]).

In DSMSs, the streaming data is mainly represented by relational messages and the queries are translated into plans of relational operators. Representative DSMS systems are the following: C-SPARQL [4, 3], CQELS [17], C-SPARQL on S4 [14] and Streaming-SPARQL [6]. The majority of the DSMS approaches use sliding windows to limit the data and focus on proposing a framework for querying the streaming data included in the window. In addition, they mainly use relational-like operators which are evaluated over the data included in the window. Note that this type of systems mainly focus on querying the data into windows and do not focus on setting up temporal operators.

On the other hand, CEP systems follow a different approach. These approaches handle the streaming data as an infinite sequence of events which are used to compute composite events. EP-SPARQL [1] is a representative system of this approach, which defines a language for event processing and reasoning. It also supports windowing operators, as well as temporal operators. C-ASP [22], now, is a hybrid approach combining the windowing mechanism and relational operators from DSMS systems, and the rule-based and event-based approach of CEP systems.

In the aforementioned approaches the streaming data mainly include new triples; i.e., no deletion of graph triples is considered, as in the setting investigated in this paper. Capturing updates (including deletion delivered by the stream) of the streaming data has been investigated in the context of incremental reasoning (e.g., [23, 10, 5, 29, 29]). Incremental Materialization for RDF STreams (IMaRS) [10, 5] considers streaming input annotated with expiration time, and uses the processing approach of C-SPARQL. Ren et al. in [23, 24]

, on the other hand, focuses on more complex ontology languages, and does not considers fixed time windows to estimate the expiration time.

The concept of expiration time is also adopted in [7] for computing subgraph matching for sliding windows. The authors also use a query decomposition approach based on node degree in graph streams. Fan et al. in [11] investigated the incremental subgraph matching by finding a set of differences between the original matches and the new matches. In [8], the authors used a tree structure where the root node contains the query graph and the other nodes include subqueries of it. The leaf nodes of this structure include the edges of the query, and the structure is used to incrementally apply the streaming changes (partial matches are also maintained). In the same context, Graphflow [15] and TurboFlux [16] present two approaches for incremental computation of delta matches. Graphflow [15] is based on a worst-case optimal join algorithm to evaluate the matching for each update, while TurboFlux [16] extends the input graph with additional edges, taking also into account the form of the query graph. This structure is used to efficiently find the delta matches.

3 Preliminaries

In this section, we present the basic concepts used throughout this work.

3.1 Data and query graphs

Initially, we define two types of directed, labeled graphs with labeled edges that represent RDF data and Basic Graph Patterns (i.e., a subclass of SPARQL) over the RDF data. In the following, we consider two disjoint infinite sets and of URI references, an infinite set of (plain) literals111In this paper we do not consider typed literals. and a set of variables .

A data triple222A data triple is an RDF triple without blank nodes. is a triple of the form , where take values from , takes values from and takes values from ; i.e., . In each triple of this form, we say that is the subject, is the predicate and is the object of . A data graph is defined as a non-empty set of data triples. Similarly, we define a query triple pattern (query triple for short) as a triple in ; i.e., could be either URI or variables from and could be either URI or variable, or literal333To simplify the presentation of our approach, we do not consider predicate variables.. A Basic Graph Pattern (BGP), or simply a query (graph), is defined as a non-empty set of query triple patterns. In essence, each triple in the query and data graphs represents a directed edge from to which is labeled by , while and represent nodes in the corresponding graphs. The output pattern of a query graph is the tuple , with , of all the variables appearing in w.r.t. to a total order over the variables of 444Although there are output patterns for each query of variables, we assume a predefined ordering given as part of query definition.. A query is said to be a Boolean, or ground, query if (i.e., there is not any variable in ). The set of nodes of a data graph (resp. query graph ) is denoted as (resp. ). The set of variables of is denoted as . In the following, we refer to either a URI or literal node as constant node. Furthermore, we say that a data graph (resp. query graph ) is a subgraph of a data (resp. query) graph if .

3.2 Data and query graph decomposition

In this subsection we define the notion of data and query graph decomposition.

A data (resp. query) graph decomposition of a data (resp. query) graph is an -tuple of data (resp. query) graphs , with , such that:

  1. , for , and

  2. .

Each data (resp. query) graph in a data (resp. query) graph decomposition is called a data (resp. query) graph segment. When, in a data/query graph decomposition, for all pairs , , with , it also holds , i.e. data (resp. query) graph segments are disjoint of each other, then the data (resp. query) graph decomposition is said to be non-reduntant and the graph (resp. query) segments obtained form a partition of the triples of data (resp. query) graph , called -triple partition of .

3.3 Embeddings and query answers

This subsection focuses on describing the main concepts used for the query evaluation. To compute the answers of a query when it is posed on a data graph G, we consider finding proper mappings from the nodes and edges of to the nodes and edges of . Such kind of mappings are described by the concept of embedding, which is formally defined as follows.

A (total) embedding of a query graph in a data graph is a total mapping with the following properties:

  1. For each node , if is not a variable then .

  2. For each triple , the triple is in .

The tuple , where is the output pattern of , is said to be an answer555The notion of answer in this paper coincides with the term solution used in SPARQL. to the query . Notice that . The set containing the answers of a query over a graph is denoted as .

Note that the variables mapping [20, 21] considered for SPARQL evaluation is related with the concept of the embedding as follows. If is a query pattern and is a data graph, and there is an embedding from to , then the mapping from to , so that for each variable node in , is a variable mapping.

A partial embedding of a query graph in a data graph is a partial mapping with the following properties:

  1. For each node for which is defined, if is not a variable then .

  2. For each triple , for which both and are defined, the triple is in .

In essence, a partial embedding represents a mapping from a subset of nodes and edges of a query to a given data graph . In other words, partial embeddings represent partial answers to , provided that, they can be appropriately “combined" with other “compatible" partial embeddings to give “complete answers” (i.e. total embeddings) to the query .

Two partial mappings and are said to be compatible if for every node such that and are defined, it holds that .

Let and be two compatible partial mappings. The join of and , denoted as , is the partial mapping defined as follows:

4 Special forms of queries and query decomposition

4.1 Special forms of queries

We now define two special classes of queries, the generalized star queries (star queries, for short) and the queries with connected-variables (var-connected queries, for short).

A query is called a generalized star query if there exists a node , called the central node of and denoted as , such that for every triple it is either or . If the central node of is a variable then is called var-centric generalized star query. A var-centric star query is simple if all the adjacent nodes to the central node are constants.

To define the class of var-connected queries we first define the notion of generalized path connecting two nodes of the query .

Let be a BGP query and , , with , be two nodes in . A generalized path between and of length , is a sequence of triples such that:

  1. there exists a sequence of nodes , , in , where are disjoint, and a sequence of predicates not necessarily distinct, and

  2. for each , with , either , or .

The distance between two nodes and in , is the length of (i.e. the number of triples in) the shortest generalized path between and .

Consider now a BGP query with multiple variables (i.e., ) such that for each pair of variables of there is a generalized path connecting and which does not contain any constant. In such a case, we say that is a var-connected query.

Notice that a var-centric query which is not simple is var-connected query.

4.2 Connected-variable partition of BGP queries

In this section we present a decomposition of BGP queries called connected-variable partition of a BGP query. Connected-variable partition is a non-redundant decomposition of a BGP query into a set of queries whose form facilitates, as we will see in subsequent sections, their efficient evaluation over a dynamic Linked Datagraph.

Let be a BGP query and be the set of variables in . A partition of the variables in , is said to be a connected-variable partition of if the following hold:

  1. For each and for every two disjoint variables , there is a generalized path between and in whose nodes are variables belonging to .

  2. For each pair , with , there is no pair of variables , such that and and there is a triple in whose subject is one of these variables and whose object is the other.

Let be a BGP query, be the set of variables in , and be the connected-variable partition of . The connected-variable decomposition of is a non-redundant decomposition of containing non-ground queries (i.e. queries containing variables), and (possibly) a ground query . These queries are constructed as follows:

  1. For each element , we construct a query = and either or .

  2. .

Note that may be empty. In this case is not included in . Notice also that when, for some , it holds that , then is a simple var-centric star query.

Consider the query appearing in Fig. 1. The set of variables of is , while the connected-variables partition of is . The connected variable decomposition of is . , and also appear in Fig. 1.

Figure 1: Connected-variable decomposition of a BGP query .

It should be noted that, under certain conditions, i.e. when all members of are singletons, all queries in containing variables may be in the form of simple var-centric star queries.

A BGP query is said to be loosely-connected if for each pair of disjoint variables , in , it holds that every generalized path between and contains at least one non-variable node.

Let be a loosely-connected BGP query and be the connected variable decomposition of . Then each non-ground query in is a simple var-centric star query.

From Definitions 4.2 and 4.2, we conclude that each member of the connected-variable partition of the set of variables of a loosely-connected BGP query is singleton. As the queries in are obtained by applying Definition 4.2, we see that, by construction, each query in has a variable as a central node which is either the subject or the object of triples which have a non-variable object or subject, respectively. Based on Definition 4.1, we conclude that these queries are simple var-centric star queries.

Consider the BGP query appearing in Fig. 2. The set of variables of is , while the connected-variables partition of is . The connected variable decomposition of is , , , , where , , , and appear in Fig. 2. We can see that each query containing variables (i.e. , , and ) is a simple var-centric star query.

Figure 2: Connected-variable decomposition of a loosely connected BGP query .

Let be a BGP query and be the set of variables in . Assume that and is the connected-variable partition of . Then, if is a singleton, we say that is a connected-variable query.

5 Continuous pattern matching

The main challenge we are investigating in this work is the case where the data graph is continuously changing (a.k.a, dynamic data graph), through an infinite sequence of updates over the data.

In particular, we initially consider an infinite, ordered sequence , , of update messages, called update stream, of the form , where

  • is the time the message received, where ,

  • is an update operation applied over a data graph and takes values from the following domain: , , and

  • is the edge to be either inserted or deleted (according to the operation specified by ).

Note that the operation stands for , while stands for . For example, the message , which is received at time describes an insertion of the edge into the data graph. The data graph resulted by applying an update message in over a data graph is a data graph defined as follows:

Suppose, now, a data graph at a certain time . At time , we receive an update message in . Applying the update on , we get an updated graph . Similarly, once we receive the update message in and apply it into , we get the graph . Continuously applying all the updates that are being received through , we have a data graph that is slightly changing over time. We refer to such a graph as dynamic graph, and to each data graph at a certain time as snapshot of . For simplicity, we denote the snapshot of at time as . Hence, at time , the graph snapshot of is given as follows:

Considering a BGP query which is continuously applied on each snapshot of the dynamic graph , we might find different results at each time the query is evaluated. In particular, if is the graph snapshot at time and we receive an insertion message at time , might contain embeddings that were not included in . In such a case (i.e., if ), we say that each embedding in is a positive embedding. Similarly, if we consider that the message received at time is a deletion, we might have embeddings in that are no longer valid in (i.e., ). If so, each embedding in is called negative embedding. For simplicity, we refer to the set of positive and negative embeddings at time as delta embeddings. In the following, we denote the sets of positive, negative and delta embeddings, at time , as , and , respectively.

Let us now formally define the problem of continuous pattern matching.

(Problem Definition) Considering a dynamic graph , an update stream , and a query , we want to find for each time , with , the outputs of all the delta embeddings , where

  • ,

  • the positive embeddings is given as ,

  • the negative embeddings is given as .

The following sections are based on the following assumptions:

  • Whenever an update message of the form is received, we assume that .

  • Whenever an update message of the form is received, we assume that .

6 Answering BGP queries over dynamic Linked Data

In this section, we investigate methods for continuously answering BGP queries over dynamic graphs. In particular, we focus on finding the delta embeddings for each message received from an update stream. The set of embeddings found by each of the algorithms presented in the subsequent sections ensure that collecting all the delta embeddings from the beginning of the stream to the time and applying the corresponding operations (deletions and insertions) according to the order they found, the remaining embeddings describe the answer of the given query over the graph . We focus on certain well-used subclasses of BGP queries, defined in the previous sections, which include ground BGP queries, simple var-centric star queries, and loosely-connected BGP queries. In addition, we present a sketch for generalizing our algorithms for connected-variable queries.

6.1 A generic procedure employing the connected-variable partition of a query

Starting our analysis, we initially focus on the intuition of the evaluation algorithms presented in the subsequent section. The methodology used to construct the algorithms relies on the proper decomposition of the given query into a set of subqueries. The following lemma shows that finding a connected-variable decomposition of a query, the total embeddings of the subqueries give a total embedding of the initial query.

Let be a data graph and be a BGP query. Assume that = is the connected-variable decomposition of . Assume also that are total embeddings of respectively, in . Then, are compatible partial embeddings of in and is a total embedding of in .

From Definitions 3.3 and 3.3 it is easy to see that a total embedding of a subquery of is also a partial embedding of . Definition 4.2 implies that for each pair of queries and , with , in . Thus, as queries share no common variables, from Definition 3.3 we conclude that the embeddings are compatible partial embeddings of . Finally, by joining (as described in Definition 3.3) we get the embedding of in . This embedding is total as each node and edge of is covered by some as are total embeddings of the corresponding subqueries in and, as we can easily derive from Definition 4.2, covers all nodes and edges of .

Let be a data graph and be a BGP query and = be the connected-variable decomposition of . Assume that is a total embedding of in . Then, there are total embeddings of respectively, in such that .

For each , with , we construct the embedding obtained by restricting to the mappings of the nodes and edges in . As, by construction, and is a total embedding of on , we conclude that is also a total embedding of to . By construction are compatible partial embeddings of in . Besides, Definition 4.2 implies that . Therefore, .

Based on the Lemma 6.1, the following generic procedure can be used for the evaluation of every BGP query :

  1. is decomposed by applying Definition 4.2 to obtain its connected-variable decomposition .

  2. All subqueries in are evaluated independently of each other.

  3. The set of answers of the query is the cartesian product of the sets of answers of the subqueries in .

Concerning the second step of the above procedure, it is important to observe that may contain the following three different types of queries (subqueries of the query ) and therefore it is necessary to design algorithms for the evaluation of each query type:

  1. zero or one ground BGP query.

  2. a (possibly empty) set of simple var-centric star queries, and

  3. a (possibly empty) set of connected-variable queries. Notice that, in this type of queries, every query triple pattern has at least one variable (as subject or object).

In the following section, we propose algorithms to evaluate queries belonging to each one of the above query types. However, as an algorithm for the evaluation of queries of type (c) is complex, we present an algorithm that evaluates an intermediate type of queries, more specifically, the loosely connected queries.

6.2 Evaluating ground BGP queries

The algorithm for the evaluation of a ground BGP query is based on the observation that is true in the current time whenever for each triple in an insert message has received and this triple has not been deleted by a subsequent delete message. To keep track of the triples of that are true at time , we employ an array of boolean variables, called triple match state of . We also assume an enumeration of the triples in . if the triple of appears in the current state of the data graph (i.e. an insert message for has arrived and has not be deleted by a subsequent delete message); otherwise, . Our algorithm is illustrated in Algorithm 1. Note that, auxiliary functions used in algorithms 1, 2 and 3, are collectively presented in Algorithm 4.

Procedure eval_groud_BGP_query() Data: : a ground query;
Result: Positive/negative answers;
for  to  do  ; Loop 
       Get the next update message ; R = (, , ); if  or  then
              output : 
              
Procedure process_update_message_on_ground_BGP(, , ) Data: : an update message; : a ground query; : the triple match state of ;
Result: A positive/negative/no_new_embedding message;
if  then
        if  = , for some , with  then
               ; if ground_BGP_eval = true then return ;
else
        if  then
               if  = , for some , with  then
                      if ground_BGP_eval(, ) = true  then
                             ; return
                     else
                            ;
return < ;
Algorithm 1 Algorithm that Evaluates ground BGP queries

Notice that the data required to be cached in the Algorithm 1 in order to be able to find any upcoming delta embedding is minimized to the space required to store the array.

6.3 Evaluating simple var-centric star queries

The procedure for the evaluation of a simple var-centric star query over an input stream is based on the following observation: The central node of the simple var-centric star query is a variable common to all triples in (either as the subject or as the object of the triple), while all the other nodes in are either URI’s or literals. This means that, the first edge of the data graph that ‘matches’ with a triple in , instantiates the variable to a constant value . In this way, we get a ground instance of the query whose evaluation can proceed in a similar way as in Subsection 6.2. For the presentation of the evaluation procedure, we consider a list of triples of the form , called list of ground var-centric instances of , where is an instance of the variable and is the ground instance of obtained by replacing the variable by , and is the triple match state of . Each time that an insert update message results in an new instantiation of the variable , a new ground instance of is created and a new triple of the form triple , is added to . On the other hand, whenever a delete update message results in the deletion of a triple which is the last true triple in a ground instance of the corresponding triple is removed from . The complete algorithm is depicted by Algorithm 2.

Procedure eval_simple_var-centric_star_query() Data: : a var-centric star query;
Result: positive/negative answers;
; Loop 
       Get the next update message ; R = (, , ); if  or  then
              output : 
              
Procedure process_update_message_on_star(, , ) Data: : an update message; : a simple var-centric query; : list of ground var-centric instances of ;
Result: A positive/negative/no_new_embedding message;
if  then
        if  unifies with in , for some , with  then
               let be the value of obtained by this unification; let the instance of obtained by replacing by ; if there is a triple for some  then
                      ; if ground_BGP_eval(, ) = true then return ;
              else
                      for  to  do ; ; ; if  then return ;
else
        if  then
               if  unifies with in , for some , with  then
                      foreach  do
                             if there is a triple such that  then
                                    if ground_BGP_eval(, ) = true then return ; ; foreach  do
                                           if all_zero() = true then remove from ;
return <
Algorithm 2 Algorithm that evaluates simple var-centric star queries

Notice that the Algorithm 2 caches all the values instantiating the central node, which typically minimizes the amount of data that is required in order to be able to find any of the delta embeddings.

Procedure eval_loosely-connected_query() Data: : a loosely connected query;
Result: positive/negative answers;
Let be the connected variable decomposition of ; Let be a partition of s.t. contains the ground queries in and contains the simple var-centric queries in ; for  to  do ; for  to  do
        for  to  do ;
for  to  do ; Loop 
        Get the next update message ; for  to  do
               R = (, , ); if  then
                      ; if  and (i.e. each var centric query in has at least one answer) then
                             foreach tuple in the cartesian product of output patterns of the queries in  do
                                    output : 
                                   
              else if  then
                      if  and  then
                             foreach tuple in the cartesian product of output patterns of the queries in  do
                                    output : 
                                   
                     ;
       for  to  do
               R = (, , ); if  then
                      ; if  and (i.e. each var centric query in has at least one answer) then
                             ; ; foreach tuple in the cartesian product of the sets in  do
                                    output : 
                                   
                            ;
              else if  then
                      if  and (i.e. all queries in have an answer (before the receipt of the message))| then
                             ; ; foreach tuple in the cartesian product of the sets in  do
                                    output : 
                                   
                            ;
                     ;
Algorithm 3 Algorithm that evaluates loosely-connected queries

6.4 Evaluating loosely-connected BGP queries

Let be a loosely-connected BGP query and be the connected variable decomposition of . From Lemma 4.2, we know that each non-ground query in is a simple var-centric star query. Therefore, to evaluate it is sufficient to evaluate each query in and then compute the cartesian product of the answers of all queries in to get the answers of . Algorithm 3 presents the continuous evaluation of such type of queries. As we can see in the description of this algorithm, the algorithm caches for each star subquery all the values instantiating the central node, which typically minimizes the amount of data that is required in order to be able to find any of the delta embeddings.

Function ground_BGP_eval(, ) Data: : the triple match state of ;
for  to  do
        if  then return ;
return ; Function all_zero(, ) Data: : the triple match state of the query ;
for  to  do
        if  then return ;
return ; Function (, ) Data: : a list of ground queries; : array of flags showing which ground queries are true;
Result: true/false;
for  to  do
        if  then return ;
return ; Function (, ) Data: : a list of simple var-centric queries; : array of sets; each set containing the output values of var centric queries;
Result: true/false;
for  to  do
        if  then return ;
return ;
Algorithm 4 Functions used in algorithms 1, 2 and 3

6.5 Evaluating connected-variable queries

In this Subsection, we discuss the basic ideas behind an algorithm for the evaluation of connected-variable queries. As we have seen in Subsection 6.1, in this type of queries, every query triple pattern has at least one variable as subject or as object of the triple. This means that the first edge of the data graph , which is an edge obtained through an insert update message, that ‘matches’ with a triple in , instantiates all variables in . In other words, it instantiates either a single variable or a pair of variables of , with non variable values. In this way we get an (possibly ground) instance of the query . If is ground or simple var-centric or loosely connected query, we proceed by evaluating by applying the algorithm presented in Subsection 6.2, or in Subsection 6.3 or in Subsection 6.4, respectively. An answer of combined with the instances of the variables in gives an answer of the query .

Otherwise, in case that is not in one of the above forms, we proceed by applying the Definition 4.2 to to get its connected variable decomposition . Then we apply iteratively the same evaluation strategy for all the queries in . Notice that, as we have seen in Lemma 6.1, the set of answers (output patterns) of is the cartesian product of the answers of the queries in . As above an answer to obtained in this way, is combined with the instances of the variables in , to give an answer of the query . An interesting observation is that this iterative decomposition of a query into subqueries, because of the instantiation of the variable(s) of a query triple pattern, should be recorded as, it should be altered in case an appropriate delete message is received.

7 Conclusion

In the previous sections, we saw that both ground BGP and simple var-centric star queries are continuously evaluated by maintaining the minimum amount of data required in order to be able to efficiently find delta answers. The continuous evaluation of loosely-connected combines the evaluation patterns of ground BGP and simple var-centric star queries. However, evaluating connected-variable queries is far more complex, since the amount of data required to be maintained is high. In future steps, we initially plan to improve the continuous evaluation of connected-variable queries and provide a generic approach for efficiently evaluating every BGP query. In addition, we aim to extend our approach in distributed environments, exploiting the processing power of well-known parallel frameworks.

References

  • [1] Anicic, D., Fodor, P., Rudolph, S., Stojanovic, N.: EP-SPARQL: a unified language for event processing and stream reasoning. In: Proceedings of the 20th international conference on World wide web. pp. 635–644 (2011)
  • [2] Balduini, M., Della Valle, E., Azzi, M., Larcher, R., Antonelli, F., Ciuccarelli, P.: Citysensing: Fusing city data for visual storytelling. IEEE MultiMedia 22(3), 44–53 (2015)
  • [3] Barbieri, D.F., Braga, D., Ceri, S., Grossniklaus, M.: An execution environment for C-SPARQL queries. In: Manolescu, I., Spaccapietra, S., Teubner, J., Kitsuregawa, M., Léger, A., Naumann, F., Ailamaki, A., Özcan, F. (eds.) EDBT 2010, 13th International Conference on Extending Database Technology, Lausanne, Switzerland, March 22-26, 2010, Proceedings. ACM International Conference Proceeding Series, vol. 426, pp. 441–452. ACM (2010). https://doi.org/10.1145/1739041.1739095, https://doi.org/10.1145/1739041.1739095
  • [4] Barbieri, D.F., Braga, D., Ceri, S., Valle, E.D., Grossniklaus, M.: C-SPARQL: a continuous query language for RDF data streams. Int. J. Semantic Comput. 4(1), 3–25 (2010). https://doi.org/10.1142/S1793351X10000936, https://doi.org/10.1142/S1793351X10000936
  • [5] Barbieri, D.F., Braga, D., Ceri, S., Valle, E.D., Grossniklaus, M.: Incremental reasoning on streams and rich background knowledge. In: Extended Semantic Web Conference. pp. 1–15. Springer (2010)
  • [6] Bolles, A., Grawunder, M., Jacobi, J.: Streaming sparql-extending sparql to process data streams. In: European Semantic Web Conference. pp. 448–462. Springer (2008)
  • [7] Choi, D., Lim, J., Bok, K., Yoo, J.: An efficient continuous subgraph matching scheme over graph streams. In: Proceedings of the 2018 Conference on Research in Adaptive and Convergent Systems. pp. 183–188 (2018)
  • [8] Choudhury, S., Holder, L.B., Jr., G.C., Agarwal, K., Feo, J.: A selectivity based approach to continuous pattern detection in streaming graphs. In: Alonso, G., Geerts, F., Popa, L., Barceló, P., Teubner, J., Ugarte, M., den Bussche, J.V., Paredaens, J. (eds.) Proceedings of the 18th International Conference on Extending Database Technology, EDBT 2015, Brussels, Belgium, March 23-27, 2015. pp. 157–168 (2015)
  • [9]

    Dell Aglio, D., Della Valle, E., van Harmelen, F., Bernstein, A.: Stream reasoning: A survey and outlook. Data Science

    1(1-2), 59–83 (2017)
  • [10] Dell’Aglio, D., Valle, E.D.: Incremental reasoning on linked data streams. In: Linked Data Management. CRC Press (2014)
  • [11] Fan, W., Wang, X., Wu, Y.: Incremental graph pattern matching. ACM Transactions on Database Systems (TODS) 38(3), 1–47 (2013)
  • [12]

    Gillani, S., Picard, G., Laforest, F.: Continuous graph pattern matching over knowledge graph streams. In: Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems. pp. 214–225 (2016)

  • [13] Groppe, S., Groppe, J., Kukulenz, D., Linnemann, V.: A SPARQL engine for streaming RDF data. In: Yétongnon, K., Chbeir, R., Dipanda, A. (eds.) Third International IEEE Conference on Signal-Image Technologies and Internet-Based System, SITIS 2007, Shanghai, China, December 16-18, 2007. pp. 167–174. IEEE Computer Society (2007)
  • [14] Jesper, H., Spyros, K.: High-performance distributed stream reasoning using s4. In: Ordring Workshop at ISWC (2011)
  • [15] Kankanamge, C., Sahu, S., Mhedbhi, A., Chen, J., Salihoglu, S.: Graphflow: An active graph database. In: Proceedings of the 2017 ACM International Conference on Management of Data. pp. 1695–1698 (2017)
  • [16] Kim, K., Seo, I., Han, W.S., Lee, J.H., Hong, S., Chafi, H., Shin, H., Jeong, G.: Turboflux: A fast continuous subgraph matching system for streaming graph data. In: Proceedings of the 2018 International Conference on Management of Data. pp. 411–426 (2018)
  • [17] Le-Phuoc, D., Dao-Tran, M., Xavier Parreira, J., Hauswirth, M.: A native and adaptive approach for unified processing of linked streams and linked data. In: International Semantic Web Conference. pp. 370–388. Springer (2011)
  • [18]

    Lécué, F., Kotoulas, S., Mac Aonghusa, P.: Capturing the pulse of cities: Opportunity and research challenges for robust stream data reasoning. In: Workshops at the Twenty-Sixth AAAI Conference on Artificial Intelligence (2012)

  • [19] Margara, A., Urbani, J., Van Harmelen, F., Bal, H.: Streaming the web: Reasoning over dynamic data. Journal of Web Semantics 25, 24–44 (2014)
  • [20] Pérez, J., Arenas, M., Gutierrez, C.: Semantics of sparql (2008)
  • [21] Pérez, J., Arenas, M., Gutierrez, C.: Semantics and complexity of sparql. ACM Transactions on Database Systems (TODS) 34(3), 1–45 (2009)
  • [22]

    Pham, T., Ali, M.I., Mileo, A.: C-ASP: continuous asp-based reasoning over RDF streams. In: Balduccini, M., Lierler, Y., Woltran, S. (eds.) Logic Programming and Nonmonotonic Reasoning - 15th International Conference, LPNMR 2019, Philadelphia, PA, USA, June 3-7, 2019, Proceedings. Lecture Notes in Computer Science, vol. 11481, pp. 45–50. Springer (2019)

  • [23] Ren, Y., Pan, J.Z.: Optimising ontology stream reasoning with truth maintenance system. In: Proceedings of the 20th ACM international conference on Information and knowledge management. pp. 831–836 (2011)
  • [24] Ren, Y., Pan, J.Z., Zhao, Y.: Towards scalable reasoning on ontology streams via syntactic approximation. Proc. of IWOD (2010)
  • [25] Roffia, L., Azzoni, P., Aguzzi, C., Viola, F., Antoniazzi, F., Salmon Cinotti, T.: Dynamic linked data: A sparql event processing architecture. Future Internet 10(4),  36 (2018)
  • [26] Schneider, D.: The microsecond market. IEEE spectrum 49(6), 66–81 (2012)
  • [27] Sheth, A., Henson, C., Sahoo, S.S.: Semantic sensor web. IEEE Internet computing 12(4), 78–83 (2008)
  • [28] Tallevi-Diotallevi, S., Kotoulas, S., Foschini, L., Lécué, F., Corradi, A.: Real-time urban monitoring in dublin using semantic and stream technologies. In: International Semantic Web Conference. pp. 178–194. Springer (2013)
  • [29] Volz, R., Staab, S., Motik, B.: Incrementally maintaining materializations of ontologies stored in logic databases. In: Journal on Data Semantics II, pp. 1–34. Springer (2005)