Knowledge Engineering for Hybrid Deductive Databases

01/03/2017 ∙ by Dietmar Seipel, et al. ∙ University of Würzburg 0

Modern knowledge base systems frequently need to combine a collection of databases in different formats: e.g., relational databases, XML databases, rule bases, ontologies, etc. In the deductive database system DDBASE, we can manage these different formats of knowledge and reason about them. Even the file systems on different computers can be part of the knowledge base. Often, it is necessary to handle different versions of a knowledge base. E.g., we might want to find out common parts or differences of two versions of a relational database. We will examine the use of abstractions of rule bases by predicate dependency and rule predicate graphs. Also the proof trees of derived atoms can help to compare different versions of a rule base. Moreover, it might be possible to have derivations joining rules with other formalisms of knowledge representation. Ontologies have shown their benefits in many applications of intelligent systems, and there have been many proposals for rule languages compatible with the semantic web stack, e.g., SWRL, the semantic web rule language. Recently, ontologies are used in hybrid systems for specifying the provenance of the different components.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Relational databases and deductive databases with rule bases have been prominent formalisms of knowledge representation for a long time [3]. A strong research interest in deductive database technology and its applications has reemerged in the recent years, leading to what has been called a resurgence [6] or even a renaissance [1] for Datalog. This revival is propelled by important new applications areas, such as distributed computations and big–data applications, the success of a first commercial Datalog System, and progress in semantics extensions to support non–monotonic constructs such as default negation and aggregates – a theoretical thread that had actually continued through the years [10].

In the last years, the use of ontologies has shown its benefits in many applications of intelligent systems. There have been many proposals for rule languages compatible with the semantic web stack, e.g., the definition of Swrl (semantic web rule language) originating from RuleML and similar approaches [7]. It is well agreed that the combination of ontologies with rule–based knowledge is essential for many interesting semantic web tasks, e.g., the realization of semantic web agents and services. Swrl allows for the combination of a high–level abstract syntax for Horn–like rules with Owl, and a model theoretic semantics is given for the combination of Owl with Swrl rules. An Xml syntax derived from RuleML allows for a syntactical compatibility with Owl. However, with the increased expressiveness of such ontologies new demands for the development and for maintenance guidelines arise.

Thus, approaches for evaluating and maintaining hybrid knowledge bases need to be extended and revised to work with different kinds of knowledge including rules, relational databases, Xml

documents, and ontologies. In this paper, we are interested in various types of knowledge bases including a collection of relational databases or ontologies, or the hierarchy in a file system. Here, the interaction between the knowledge bases can be very important. E.g., there could be a call from a logical rule to a bayesian network. The abstraction of knowledge bases is related to the schemas of relational and

Xml

databases, the predicate dependency and rule predicate graphs of deductive databases and rule–based systems. Obviously, w.r.t. versioning, there is a relationship to synchronisation and diff for programs and files. For file systems, well–known characteristics are the size parameters, such as words, lines, characters.

Organization of the Paper.

The rest of this paper is organized as follows: Section 2 shows how rule bases can be abstracted and visualized in DDbase using different types of dependency graphs. Derivations can be explained by proof trees. Section 3 investigates ontologies with rules in Swrl, and it shows how the provenance of ontologies can be modelled and reasoned about. Hybrid knowledge bases can also be queried in combined Prolog statements in DDbase, see Section 4. The paper is concluded with some final remarks.

2 Graphs for Rule Bases

In a deductive database, a logic program can be abstracted by a predicate dependency or a rule predicate graph, and a derivation can be visualized by its proof tree, cf. 

[3]. It should also be possible to have bottom–up and top–down evaluation in one system. Datalog can evaluate logic programs with Prolog syntax (extended Datalog programs) in a bottom–up style; it is designed to evaluate embedded Prolog calls in a top–down manner [14].

Dependency Graphs

We can define two sorts of dependency graphs for a normal logic program . Both reflect, which predicates call other predicates in . Let be the predicate symbol of a literal . The predicate dependency graph of is given by:

  • the node set is the set of predicate symbols in ,

  • contains an edge for every rule of and .

The rule predicate graph of is given by:

  • contains a node for every predicate symbol in and every rule in ,

  • contains an edge and an edge for every rule of and .

In both graphs, the edges and , respectively, which come from default negated body literals , are marked by “”. We get the following dependency graphs for a normal rule

The rule predicate graph is more refined than the predicate dependency graph , see Figure 1.





Figure 1: Predicate Dependency Graph.

The following definite logic programs have the same predicate dependency graph but different rule predicate graphs and :

The predicate dependency graph can be obtained from the rule predicate graph by transitively connecting the head and body predicate symbols and omitting the rule nodes between. Restricted to the predicate symbols, both graphs habe the same paths. They are typically used for analysing logic programs of software packages and for refactoring and slicing.

Dependency Graphs with Helper Rules

Dependency graphs can help to detect that two logic programs are equivalent apart from helper rules:

With the most general unifier we can obtain the following resolvent of the two rules:

Apart from the helper predicate symbol , the set of predicate symbols that is reachable from in the predicate dependency or rule predicate graph is the same for the two programs and For the former, we get for the latter we get

Dependency Graphs with Meta–Predicates

In DDbase, we use an extra node for every call of a meta–predicate to correctly reflect a predicate. The specific difference between normal predicates and meta–predicates in our extension of the rule predicate graph is that there can be several nodes labelled with the same meta–predicate.

The following example contains two such calls, i.e. to the meta–predicates not/1 and findall/3. The predicate ancestor_list/2 derives the list Xs of ancestors of a person X.

ancestor_list(X, [X]) :-
   not(parent(X, _)),
   !.
ancestor_list(X, Xs) :-
   findall( Ys,
      ( parent(X, Y),
        ancestor_list(Y, Ys) ),
      Yss ),
   append(Yss, Xs).

The Prolog program is recursive, since ancestor_list/2 calls itself through findall/2, see Figure 2. After that, the predicate append/2 appends the list Yss of derived lists to a regular list Xs.

ancestor_list/2/1findall/3parent/2append/2
Figure 2: Rule Predicate Graph with Meta Predicates.

Obviously, the Prolog program above only terminates for top–down evaluation on acyclic predicates parent/2. The nodes for the meta–prediactes are necessary to show that ancestor_list/2 depends on parent/2 and ancestor_list/2 itself. If we would use ordinary predicate dependency or rule predicate graphs, then we would lose this information.

Derivations and Proof Trees

We are developing a deductive database system DDbase, which can manage hybrid rule bases and embed Prolog calls into Datalog rules. The underlying language Datalog has Prolog syntax; its mixing of bottom–up and top–down evaluation is described in [14]. The following logic program deals with a well–known extension of the route finding problem from deductive databases. In addition to the lengths of derived routes (3rd argument of route/4), we can construct a proof tree (4th argument). An atom prolog:A leads to an embedded top–down call of the goal A in Prolog. The goal (L is N+M) computes the sum L of the path lengths N and M in Prolog. A goal of the form pt(T, ...) constructs a proof tree T.

route(X, Y, L, T) :-
   street(X, Y, L, T1),
   prolog:pt(T, t(route(X, Y, L), e, T1)).
route(X, Y, L, T) :-
   street(X, Z, N, T1), route(Z, Y, M, T2),
   prolog:(L is N+M),
   prolog:pt(T,
      t(route(X, Y, L), r, T1, T2, (L is N+M))).
street(’KT’, ’Wue’, 15, T) :-
   prolog:pt(T, t(street(’KT’, ’Wue’, 15), f1)).
street(’Wue’, ’Mue’, 280, T) :-
   prolog:pt(T, t(street(’Wue’, ’Mue’, 280), f2)).

In DDbase, we have implemented a generalized –operator, which can derive the following atom by a bottom–up evaluation. It can be seen, that the evaluation derives a proof tree in the last argument of the predicates route/4 and street/4. This tree serves as an explanation of the derived atoms, a very useful concept known from expert systems. E.g., this program will derive the atom shown in the following. The last argument of the atom contains the proof tree, which was automatically layouted and visualized in DDbase, see Figure 3.

route(KT, Mue, 295,
   t(route(KT, Mue, 295), r,
      t(street(KT, Wue, 15), f1),
      t(route(Wue, Mue, 280), e,
         t(street(Wue, Mue, 280), f2))))

                      

Figure 3: Proof Tree.

3 Ontologies with Rules in Swrl

A hybrid information system can include ontologies of various origins. Before working with and – and for designing such ontologies – the knowledge engineer has to anaylse them and check them for anomalies. In DDbase, we use methods for detecting anomalies in ontologies with rules, such as Swrl ontologies, which we have investigated in [2]. For handling different versions of an ontology, it is also possible to use well–known alignment methods to find out common parts and differences.

3.1 Schema Graph for Xml

Swrl ontologies can be represented in Xml notation:

<swrlx:Ontology swrlx:name="people">
<swrlx:classAtom>
  <owlx:Class owlx:name="person"/>
  <ruleml:var>X</ruleml:var>
</swrlx:classAtom>
<swrlx:classAtom>
  <owlx:IntersectionOf>
    <owlx:Class owlx:name="person"/>
    <owlx:ObjectRestriction owlx:property="parent">
      <owlx:someValuesFrom owlx:class="Physician"/>
    </owlx:ObjectRestriction>
  </owlx:IntersectionOf>
  <ruleml:var>Y</ruleml:var>
</swrlx:classAtom>
<ruleml:imp> ... </ruleml:imp>
</swrlx:Ontology>

In DDbase, a Swrl ontology can be visualized by the schema graph of its Xml representation. Since ontologies – or Xml files in general – can be very large, it is helpful to get a short overview in advance, see Figure 4.

       

Figure 4: Schema Graph for Xml.

3.2 Rules in Swrl

Swrl ontologies can have rules in RuleMl. E.g., the following rule, which states that the brother of a parent is an uncle, can be represented in RuleMl:

In the Xml representation, this could look like follows:

<ruleml:imp>
  <ruleml:_body>
    <swrlx:individualPropertyAtom
      swrlx:property="parent">
      <ruleml:var>X</ruleml:var>
      <ruleml:var>Y</ruleml:var>
    </swrlx:individualPropertyAtom>
    <swrlx:individualPropertyAtom
      swrlx:property="brother">
      ...
  </ruleml:_body>
  <ruleml:_head>
    <swrlx:individualPropertyAtom
      swrlx:property="uncle">
      ...
  </ruleml:_head>
</ruleml:imp>

3.3 Syntax and Semantics

The syntax for Swrl in this section abstracts from any exchange syntax for Owl and thus facilitates access to and evaluation of the language. This syntax extends the abstract syntax of Owl described in the Owl Semantics and Abstract Syntax document [11]. This abstract syntax is not particularly readable for rules. Thus, examples will thus often be given in an informal syntax, which will neither be given an exact syntax nor a mapping to any of the fully–specified syntaxes for Swrl.

The abstract syntax is specified here by means of a version of Extended BNF, very similar to the EBNF notation used for Xml. Terminals are quoted; non-terminals are bold and not quoted. Alternatives are either separated by vertical bars (|) or are given in different productions. Components that can occur at most once are enclosed in square brackets; components that can occur any number of times (including zero) are enclosed in curly braces. Whitespace is ignored in the productions here. Names in the abstract syntax are RDF URI references. The meaning of some constructs in the abstract syntax will be informally described. The formal meaning of these constructs can be defined via an extension of the Owl DL model–theoretic semantics [11].

An Owl ontology in the abstract syntax contains a sequence of axioms and facts. Axioms may be of various kinds, e.g., subClass axioms and equivalentClass axioms. It is proposed to extend this with rule axioms. Similar to what is usual in logic programming, a rule axiom consists of an antecedent (body) and a consequent (head), each of which consists of a (possibly empty) set of atoms. Antecedents and consequents are treated as the conjunctions of their atoms.

rule ::= ’Implies(’ { annotation }
          antecedent consequent ’)’
antecedent ::= ’Antecedent(’ { atom } ’)’
consequent ::= ’Consequent(’ { atom } ’)’

Rules with an empty antecedent can be used to provide unconditional facts; however such unconditional facts are better stated in Owl itself, i.e., without the use of the rule construct. Rules with conjunctive consequents could easily be transformed – via the Lloyd–Topor transformations [8] – into multiple rules each with an atomic consequent. Atoms can be of the form c(X), p(X,Y), same_as(X,Y) or different_from(X,Y), where c is an Owl description, p is an Owl property, and X,Y are either variables, Owl individuals or Owl data values. In the context of Owl Lite, descriptions in atoms of the form c(X) may be restricted to class names. Informally,

  • an atom c(X) holds, if X is an instance of the class description c,

  • an atom p(X,Y) holds, if X is related to Y by property p,

  • an atom same_as(X,Y) holds, if X is interpreted as the same object as Y, and

  • an atom different_from(X,Y) holds if X and Y are interpreted as different objects.

The latter two forms can be seen as syntactic sugar: they are convenient, but do not increase the expressive power of the language. Atoms may refer to individuals, data literals, individual variables or data variables. As in Prolog or Datalog, variables are treated as universally quantified, with their scope limited to a given rule. Only variables occuring in the antecedent of a rule may occur in the consequent (range–restrictedness). This condition does not, in fact, restrict the expressive power of the language, because existentials can already be captured in Owl using someValuesFrom restrictions.

While the abstract EBNF syntax is consistent with the Owl specification, and is useful for defining Xml and RDF serialisations, it is rather verbose and not particularly easy to read. Deductive databases therefore often use a relatively informal human readable form. In this syntax, a rule has the form , where the antecedent  is a conjunction of atoms and the consequent  is a single atom. In standard convention, variables are indicated by prefixing them with a question mark; in this paper, however, we represent them in Prolog convention with strings starting with a capital letter. Then, a rule asserting that the composition of parent and brother properties implies the uncle property would be written as

An even simpler rule would be to assert that students are persons:

However, this kind of use for rules in Owl just duplicates the Owl subclass facility. It is logically equivalent to write instead Class(Student partial Person) or SubClassOf(Student Person) which would make the information directly available to an Owl reasoner. A very common use for rules is to move property values from one individual to a related individual.

3.4 Hybrid Information Systems

For collaborations across disciplines, hybrid information systems using data and techniques from many different sources with no preexisting agreement about the semantics of the processes or data is important. The infrastructure must provide general purpose mechanisms for annotating (i.e., making assertions about), discovering, and reasoning about processes and data. Some of the inferences require additional reasoning beyond that supported by Owl and Swrl. Also graphs such as the provenance graph are very useful here for representing causal relationships. The Open Provenance Model (OPM) defines logical constraints on the provenance graph [9]. Some constraints cannot be expressed in Owl, but can be expressed based on Swrl rules.

PROV is a specification that provides a vocabulary to interchange provenance information. It defines a core data model for the interchange of provenance on the web; it allows for building representations of the entities, people and processes involved in producing a piece of data in the world. The provenance of digital objects represents their origins; the records of a PROV specification can describe the entities and activities involved in producing and delivering or otherwise influencing a given object. Provenance can be used for many purposes, such as understanding how data was collected so it can be meaningfully used, determining ownership and rights over an object, making judgements about information to determine whether to trust it, verifying that the process and steps used to obtain a result complies with given requirements, and reproducing how something was generated.

The Open Provenance Model (OPM) provides a case study for how to use Semantic Web technology and rules to implement semantic metadata. [9] discusses a binding of the OPM written in Owl with rules written in Swrl. This allows for the development of hybrid systems that use Owl, Swrl, and other semantic software, interoperating through a shared space of RDF triples. E.g., four relations between two artifacts (e.g., input and output datasets) can be inferred from and to the same process:

But several of the key constraints and inferences of the OPM cannot be expressed in Owl and Swrl, due to fundamental limits of the semantics of these languages. E.g., it is not possible to modify the value of an asserted property, or to write a rule discovering the number of times an artifact is used, or to detect a cycle in the provenance graph. Storing the OPM records in triples makes it possible to use other reasoning engines or languages such as Prolog or Datalog to implement queries or inferences.

Owl and Swrl’s Rdf representations provide a simple and well–understood means of exchanging provenance information with other tools, such as Rdf databases and declarative programming languages. This hybrid system shows that Semantic Web technologies are not only useful for provenance information but also provide a base level of interoperability that can enable loosely–coupled tools with varying levels of capability and expressiveness.

4 Querying Hybrid Knowledge Bases in DDbase

Hybrid knowledge bases can be managed and queried using logic programming techniques. In the deductive database system DDbase, various representations of knowledge can be accessed, see Figure 5. The database query language Datalog is an extension of Datalog [14], where logic programs in Prolog syntax are evaluated bottom–up, and embedded calls to Prolog are evaluated top–down. Xml data from Xml databases or documents can be stored in a term representation; calls to Xml data based on path expressions are evaluated in Prolog using the query, transformation and update language FnQuery [13].







SqlDatalog
JavaProlog
Data SourcesDB Query LanguagesProgramming Languages

RelationalDatabaseXml,OwlExcel(.csv)TextFile
Figure 5: Hybrid Knowledge Bases in DDbase.

In DDbase, we can compute joins of relational databases and Xml documents in Prolog. The following example is a modified version of the well–known example from the book of Elmasri and Navathe [4]. The atoms for employee/8 could, e.g., be derived using Odbc from a relational database; they could also stem from an ontology; Swi Prolog [15] offers a loader producing Rdf triples, which can be transformed to Prolog facts easily.

% employee(Name, SSN, Date, SEX, Salary, Super_SSN, DNO)
employee(’Borg’, 11, date(1927,11,10), ’M’, 55000, null, 1).
employee(’Wong’, 22, date(1945,12,08), ’M’, 40000, 11, 5).
employee(’Wallace’, 33, date(1931,6,20), ’F’, 43000, 11, 4).
employee(’Smith’, 44, date(1955,1,09), ’M’, 30000, 22, 5).
...

Additionally, we work with the following Xml version of the table works_on/3; a row represents an employee ESSN working on a project PNO a number of HOURS.

<table name="works_on">
   <row ESSN="11" PNO="20" HOURS="NULL"/>
   <row ESSN="22" PNO="2" HOURS="10.0"/>
   ...
</table>

The following query joins the atoms for employee/8 with the rows in the Xml document works_on.xml in DDbase. The attribute value H of the attribute ’HOURS’ of Row is an atom that has to be converted to a number HOURS. Clearly, this predicate fails for HOURS = null, which is desired to ignore null values in aggregations in Sql. The handling of path expressions applied to Xml documents has been described in [13]. The template [DNO, sum(HOURS)] leads to a grouping on the department numbers. For every DNO, the list Xs of all corresponding HOURS is computed, and the sum by sum(Xs, Sum); thus, we obtain a standard result tuple [DNO, Sum].

?- ddbase_aggregate( [DNO, sum(HOURS)],
      ( employee(_, SSN, _,_,_,_,_, DNO),
        Row := doc(’works_on.xml’)/row::[@’ESSN’=SSN],
        H := Row@’HOURS’, atom_number(H, HOURS) ),
      Tuples ).
Tuples = [[1, 0.0], [4, 115.5], [5, 140.0]]

A query optimizer of DDbase could rearrange the Goal in the second argument of the Prolog atom for ddbase_aggregate/3 by changing the order of the calls to the predicate employee/8 provided by Odbc and the Xml document works_on.xml. It might be the best to first completely load the table Employee from the relational database to Prolog using Odbc and to index it on the second argument position, which holds the SSN. Then, in a single pass through the Xml document, the working hours can be obtained using a path expression in FnQuery from the Xml rows, and the corresponding department numbers of the employees can be obtained using the index.

5 Final Remarks

We have developed a Prolog–based deductive database system DDbase for hybrid knowledge bases. Knowledge sources with different types of knowledge representation – including relational databases, Xml, Swrl knowledge bases – can be managed in DDbase.

The transition from relational databases to deductive databases brings in recursive rules, and thus generalizes the concept of views. Swrl

ontologies add further ideas from artificial intelligence; ontologies can be augmented by rules to enhance expressiveness. PROV uses

Swrl to model the provenance of digital data.

In DDbase, we are building a system for handling hybrid queries in a deductive database. A query optimizer should extend relational systems, and it should be able to handle relational data and rules together with Xml data and ontologies.

References

  • [1] S. Abiteboul. Datalog: La renaissance. http://www.college-de-france.fr/site/serge-abiteboul/course-2012-05-09-10h00.htm.
  • [2] J. Baumeister, D. Seipel: Anomalies in Ontologies with Rules. Journal of Web Semantics: Science, Services and Agents on the World Wide Web 8 (2010), No. 1, pp. 55–68, doi:http://dx.doi.org/10.1016/j.websem.2009.12.003.
  • [3] S. Ceri, G. Gottlob, L. Tanca, Logic Programming and Databases, Springer, Berlin, 1990.
  • [4] R. Elmasri, S.B. Navathe: Fundamentals of Database Systems. 7th Edition, Benjamin Cummings, 2015.
  • [5] B. Grosof, I. Horrocks, R. Volz, S. Decker. Description Logic Programs: Combining Logic Programs with Description Logic, in Proc. WWW 2003.
  • [6] J.M. Hellerstein, Datalog Redux: Experience and Conjecture. PODS, 2010, doi:http://dx.doi.org/10.1145/1807085.1807087.
  • [7] I. Horrocks, P.F. Patel–Schneider, S. Bechhofer, D. Tsarkov, Owl Rules: A Proposal and Prototype Implementation, Journal of Web Semantics 3 (1) (2005), pp. 23–40, doi:http://dx.doi.org/10.1016/j.websem.2005.05.003.
  • [8] J.W. Lloyd, Foundations of Logic Programming, 2nd Edition, Springer, 1987, doi:http://dx.doi.org/10.1007/978-3-642-83189-8.
  • [9] R.E. McGrath, J. Futrelle, Reasoning About Provenance with Owl and Swrl Rules, AAAI Spring Symposium, 2008.
  • [10] J. Minker, D. Seipel, C. Zaniolo: Logic and Databases: History of Deductive Databases, in the Handbook of the History of Logic, Volume 9, Computational Logic, 2014.
  • [11] P.F. Patel–Schneider, P. Hayes, I. Horrocks, eds. Owl Web Ontology Language Semantics and Abstract Syntax.
  • [12] A. Preece, R. Shinghal, A. Batarekh, Principles and Practice in Verifying Rule–Based Systems, The Knowledge Engineering Review 7 (2) (1992), pp. 115–141, doi:http://dx.doi.org/10.1017/S026988890000624X.
  • [13] D. Seipel: Processing Xml Documents in Prolog. Proc. 17th Workshop on Logic Programming (WLP 2002).
  • [14] D. Seipel: Practical Applications of Extended Deductive Databases in Datalogs. Proc. 23rd Workshop on Logic Programming (WLP 2009).
  • [15] J. Wielemaker, An Overview of the SWI–Prolog Programming Environment, in: Proc. of the 13th International Workshop on Logic Programming Environments (WLPE), 2003.
  • [16] W3C, RDF 1.1 Turtle – W3C Recommendation, http://www.w3.org/TR/turtle/ (February 2014).