Ontology Design Facilitating Wikibase Integration – and a Worked Example for Historical Data

Wikibase – which is the software underlying Wikidata – is a powerful platform for knowledge graph creation and management. However, it has been developed with a crowd-sourced knowledge graph creation scenario in mind, which in particular means that it has not been designed for use case scenarios in which a tightly controlled high-quality schema, in the form of an ontology, is to be imposed, and indeed, independently developed ontologies do not necessarily map seamlessly to the Wikibase approach. In this paper, we provide the key ingredients needed in order to combine traditional ontology modeling with use of the Wikibase platform, namely a set of axiom patterns that bridge the paradigm gap, together with usage instructions and a worked example for historical data.

READ FULL TEXT VIEW PDF

page 14

page 16

06/24/2021

Pattern-based Visualization of Knowledge Graphs

We present a novel approach to knowledge graph visualization based on on...
11/26/2020

A Computational Approach to Historical Ontologies

This paper presents a use case exploring the application of the Archival...
11/05/2018

Data Integration for Supporting Biomedical Knowledge Graph Creation at Large-Scale

In recent years, following FAIR and open data principles, the number of ...
12/13/2020

Knowledge Graph Management on the Edge

Edge computing emerges as an innovative platform for services requiring ...
05/28/2021

Social Engineering in Cybersecurity: A Domain Ontology and Knowledge Graph Application Examples

Social engineering has posed a serious threat to cyberspace security. To...
12/11/2018

The Empusa code generator: bridging the gap between the intended and the actual content of RDF resources

The RDF data model facilitates integration of diverse data available in ...
08/02/2022

CAPD: A Context-Aware, Policy-Driven Framework for Secure and Resilient IoBT Operations

The Internet of Battlefield Things (IoBT) will advance the operational e...

1 Introduction

When developing a knowledge graph, there are many aspects to consider during its deployment. These range from usability of its interfaces (both human and programmatic), the (re)usability of the data that it contains, its accessibility (both in terms of uptime and its interfaces), transparency (relating to provenance and trustworthiness), and its persistence (preventing link rot). These characteristics are neatly summarized in the FAIR manifesto Wilkinson et al. (2016). One way of accomplishing (re)usability of the data is through the principled use (i.e., using a structured development methodology) of a schema that describes and documents the relations between concepts in the knowledge graph. With respect to accessibility and persistence, one can consider exposing a SPARQL endpoint and allowing interested parties to query against it. While this is a very flexible approach, it makes it difficult to explore the data. On the other hand, one could consider exposing data through a framework such as Wikibase. In this paper, we explore how the Modular Ontology Modeling methodology (MOMo Shimizu et al. (2021)) can be applied in such a way that eventual deployment of the graph data to the Wikibase model is seamless.

Modular Ontology Modeling (MOMo) specifies the development of ontology modules for sets of tightly bound key notions that will be included in a given ontology Shimizu et al. (2021).111We focus on the MOMo paradigm as it is closely aligned with our use case, but any pattern-based methodology, such as eXtreme Design Presutti et al. (2009) would work similarly. When developing a module, it is generally suggested to identify applicable ontology design patterns (ODPs) Gangemi and Presutti (2009) and adapt them to the use-case at hand through template-based instantiation Hammar and Presutti (2016). During this process it is good practice to consult existing collections of ODPs, such as those on the ODP Community Wiki222See https://ontologydesignpatterns.org/. or in the MODL library Shimizu et al. (2019b).

As one of the largest publicly editable and accessible knowledge bases, Wikidata is an immense, crowd-sourced knowledge base with persistent data that is available for public use and consumption. Wikidata contains millions of pieces of knowledge from many different domains in the world, and is growing constantly. In addition, it serves as the structured data hub for all of Wikimedia’s projects (e.g., Wikipedia, Wikivoyage, Wiktionary, and Wikisource). As such, when modeling an ontology, it makes sense to consider ease of integration with resources like Wikidata, among other Linked Data Platforms.333See https://www.wikidata.org/. Wikibase is the software underlying Wikidata, which can be used separately from Wikidata for knowledge graph creation and management.

The Enslaved Ontology Shimizu et al. (2020) was modeled using the nascent MOMo methodology and is used as the schema for the knowledge graph that underlies the publicly available knowledge base on the Enslaved Hub.444https://enslaved.org/ The Enslaved Hub is an innovative and compelling centralized location for engaging with historical slave trade data from a variety of sources and is supported by an underlying installation of the Wikibase platform. During development, we had assumed that it would be relatively easy to adapt the modular Enslaved ontology to the Wikibase model. Unfortunately, between different semantics for validating data to be put into the knowledge base and unclear mapping between different notions of provenance, it was not as straightforward as we had expected, resulting in a realization of the knowledge graph in Wikibase that is conceptually close but still markedly different from the designed Enslaved Ontology Zhou et al. (2020). However, during the course of that work we realized how the gap between ODP-based ontology modeling and Wikibase software use can be closed, centrally by basing the ontology design on new ODPs that are developed to allow for seamless use with Wikibase. The result of our subsequent work is what we present in this paper.

So in this paper, we will present a library of ontology design patterns that have been specifically engineered to explicitly represent how Wikibase models data “under-the-hood,” thus ensuring that the ontology is optimally structured for interoperability with Wikibase. This library can be used by any organization to model their own internal and proprietary knowledge graphs and apply their alignments to Wikibase as an important tool to augment or induce new information into their own knowledge graph. In particular, to the best of our knowledge this paper provides the first ODP library that provides a bridge between traditional ontology engineering and use of the Wikibase platform.

The rest of this paper is organized as follows. In Section 2 we describe relevant aspects of Wikibase and how they give rise to mismatches with traditional ontology modeling. In Section 3 we describe our ODP library and how it addresses these mismatches. In Section 5 we provide a case study: a reconstructed Enslaved Ontology. Section 6 discusses related work before we conclude in Section 7.

This paper is a very substantial extension of work previously presented at a workshop Eells et al. (2021). Our pattern library is publicly available from our online portal.555https://gitlab.cs.ksu.edu/daselab/wikibase-ontology-design-library

2 Background & Motivation

Figure 1: Wikibase RDF export schematic.

2.1 The Wikibase Model

Briefly, the Wikibase RDF export model (Figure 1)666This is a redrawing of the figure at https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format, where more information can be found. Further details are in Erxleben et al. (2014). uses the notion of reification to attach qualifiers and references (for provenance) to assertional statements. Reification is the practice of turning a property into an instance, so that additional assertions may be attached to the property. A common use is to attach a temporal scope to a role; for example, an employee is employed at a company from 2009-2014. In Wikibase terminology, this temporal scope would be considered a qualifier. The supporting documentation, perhaps digital tax records, could be considered a reference for this information.

Figure 2: A closer look at the reification of wdt:R polish figure.

Wikibase goes on to simplify how this reification occurs. The assertion is hashed, which is then turned into an instance of wikibase:Statement. Then, the property name is reused in two different namespaces, to complete the reification. To continue our example,777For readability’s sake, we do not use in the example the internal Wikibase identifiers QXXXX and PXXXX. The human readable representations found in the interface are from rdfs:label values. it would be serialized in RDF as

ex:employee0 p:hasJob s:12345 .
s:12345 ps:hasJob ex:job0 .
ex:employee0 wdt:hasJob ex:job0 .

where s:12345 is the hash node created by the system. This hashing is diagrammatically shown in Figure 2. In Figure 1 the corresponding nodes are Item, Statement and the simple value; note the use of prefixes. It will become clear from our discussion below in Section 3 how exactly this RDF export is produced.

2.2 Paradigmatic Conflicts

The Wikibase approach to creating an RDF graph puts limitations on the graph structures that can be created through Wikibase. Some of these restrictions are mild in terms of graph modeling, however others forbid graph structures that are sometimes desirable. These restrictions primarily come from the fact that the Wikibase approach restrains what can be stated about the reification nodes (i.e., the hashes). We give some examples from the Enslaved ontology.

Figure 3: Enslaved Ontology: AgeRecord representation.

Figure 3 depicts a part of a schema diagram for the AgeRecord module in the Enslaved ontology. The color and shape coding can be mostly ignored for this discussion: boxes represent classes (concepts), ovals represent datatype values and arrows represent binary relationships (properties). Conceptually, this is to represent information about the age of an Agent (person) at a specific point in time. The provenance information (isDirectlyBasedOn relation) is for representing the origin of the data in the record. Clearly, the lower row (light blue) are references or qualifiers (using the Wikibase terminology).

The upper left of the diagram is what presents the difficulty. The schema is based on the fact that some sources in this application context report age as a number, while others present age categories, like child or infant, in this application case, as a controlled vocabulary. From a (faithful) data integration perspective, the desire in this case was to make it possible to record either or both types of information.

Now, if we take the light blue boxes as qualifiers and references, then the AgeRecord node would be the hashed node in the Wikibase system. However we would not be able to have both the number value and the controlled vocabulary attached to the hash node on par. The Wikibase approach would force us to pick one of them (e.g., the controlled vocabulary) as the primary relation for the age, while the other (the number in this case) would be another qualifier. While this is possible, it is arguably not a faithful representation of the conceptual model which does not (and should not, in this case) prefer one way of recording age over the other.

Figure 4: Enslaved Ontology: ParticipantRoleRecord representation.

Figure 4 shows another Enslaved schema diagram, in this case for historical records on participation in an event. The participant role type (intended use with a controlled vocabulary) indicates the role of the participant, e.g., the role of captain in a slave transport voyage. As before, the light blue boxes are most naturally taken as qualifiers or references. And again, as before, we are left with a decision as to the main relationship, and different arguments can be made. For example, from an agent-centric perspective, we would argue that the relation between Agent and Event is primary, in which case the type of participation would be relegated to a qualifier. However, we may also argue that the type of participation (e.g., whether as enslaved person or buyer in a sale) is of most interest in some situations, in which case the relation between Agent and the participation type is primary, and the event modeled as a qualifier. While either is possible, in principle, we note that the Wikibase model forces to pick a primary relationship, while the original intention for the ontology was to remain impartial in this respect.

Another complication arises from the decision – on the side of the ontology – to include an inverse role between Event and ParticipantRoleRecord, in this particular case. Since ParticipantRoleRecord will end up being the hash node, this cannot be done in Wikibase. We are forced to use a particular direction, in this case the roleProvidedBy relation, as we have limited control over relation directions involving hash nodes.

Figure 5: Enslaved Ontology: InterAgentRelationshipRecord representation.

The situation depicted in Figure 5 is very similar to the one just discussed; in this case it is about recording relationships between agents (persons). For Wikibase, we are forced to make a determination whether the agent-agent relation is primary, or whether the type of relationship (as viewed from one of the agents) is primary. In addition, there are several options how to resolve the choice in the ontology model to have three relations between agents and the (reified) record; in particular the symmetry in the ontology in terms of isRelationshipFrom and isRelationshipTo cannot be maintained.

Figure 6: Enslaved Ontology: NameRecord representation.

Figure 6

is an example of a different type, the intended use is for recording names and name variants of persons. The most natural candidates for qualifiers and references are again the blue boxes, while the most natural candidate for the primary relation is probably between

Agent and the string, but we will have to pick on of the three relations between NameVariant and the string as the primary relation and relegate the other two to qualifiers. In this case, however, we will not be able to have two nodes between Agent and the string, as Wikibase produces only one hash node. The natural grouping that the ontology provides is thus lost, unless one takes the relation between NameRecord and the string as the primary relation, which does not come naturally from a modeling perspective. Another attempt to force this into the Wikibase RDF model may be to make name variants qualifiers to the hash node; however since we cannot provide qualifiers to qualifier relations (or to reference relations), this would prohibit us from having, for example, provenance information for name variants.

Regarding axiomatization of an underlying ontology, the Wikibase approach also imposes some restrictions, and we will discuss these issues in more detail later in the paper.

Another method of describing the Wikibase model succinctly would be to use a property graph Hartig (2017), such as through the use of RDF* and SPARQL*. However, there is currently not a method for axiomatically describing these sorts of graphs.888In our search we did find a corresponding OWL*, https://github.com/cmungall/owlstar however, it is still in a draft prototype phase.

It may need emphasizing that we list the above limitations of the Wikibase approach not in order to find fault in it. Rather, the RDF export realization appears to be a very clever use of namespaces for the serialization of reification. Most importantly, though, the Wikibase approach was put in place for a very clearly defined use case, namely to support the crowd-sourced development of Wikidata. As such, it restricts the freedom of the user in choosing graph structure in both explicit and implicit ways, by prompting the user to think (and structure the data) in terms of primary relations with attached qualifiers and references. This appears to be a very natural and, in many cases, very adequate approach. However, these advantages also come with the disadvantage of a certain loss of flexibility, in particular pertaining to the representation of more strongly structured and more fine-grained data.

Now, in application cases where stronger demands are made on graph structure – and as we have seen from the examples above – we cannot readily or easily transfer an ontology-based schema into the Wikibase format. But rather than having to choose between an ontology modeling approach or use of Wikibase, in this paper we will show how we can make them work together more seamlessly, by providing ontology design patterns that capture – and thus cater for – the restrictions imposed by Wikibase.

3 Generic Wikibase Patterns

There are two core deliverables in this manuscript. First is a library of conceptual design patterns that visually represent a paradigm for developing a schema that will immediately align to the Wikidata model. The patterns are designed to be intuitive by condensing, or folding, many of the Wikibase particulars away from the interested developer, which at the same time will prompt the developer to think about the schema in terms of primary relations with attached qualifiers and references, as discussed above. The second deliverable is the set of expanded patterns that conform to the structure of the Wikibase RDF export, and which are capable of being axiomatically described.999This has a few caveats which are discussed individually in the following subsections.

These patterns are presented in detail, paired together, in Figures 7-14 and discussed in Section 3.3. The diagrams and their syntax are summarized in the next sections (Sections 3.1 and 3.2), together with a discussion what it means to expand the conceptual diagrams. We also provide the artifacts associated with this manuscript: a way to validate the resulting serializations of the expanded patterns, specified in ShEx (the Shape Expression Language101010https://shex.io/) and WOPL, the Wikibase Ontology Design Pattern Library, which follows the MODL architecture Shimizu et al. (2019b) in Section 3.4.

3.1 Schema Diagrams

Schema diagrams are intuitive visual representations of the structure of an ontology. They are, in general, not unambiguous. In particular, this means that a particular node-edge-node construction can take on many different meanings. Instead, the idea is to communicate that there is some relation between the two concepts, and the more exact nature of the relation is deferred to the formal axiomatization which is expressed in OWL or some other suitable logic Hitzler et al. (2010). For additional reading on the meaning behind schema diagram edges, see Shimizu et al. (2019a); Eberhart et al. (2021). For this paper, we have modified the traditional syntax used in the MOMo Shimizu et al. (2021) methodology to better communicate the important conceptual components when modeling with Wikibase, while acknowledging the underlying complexity of what Wikibase automatically generates during RDF serialization.

3.2 The Graphical Syntax

In our portal, we provide additional variations of Figure 7 that are black & white print friendly, and a color-blind palette.111111https://gitlab.cs.ksu.edu/daselab/wikibase-ontology-design-library

Conceptual patterns and their expansions as diagrams

In each diagram (Figures 7-14), we have paired a conceptual diagram (the upper diagram) and its expansion. The conceptual diagram, as we have previously described, is a method for graphically depicting, in a succinct manner, “what is connected to what, and how?” We have attempted to make this clear by pairing the label colors with border colors. That is, an orange label in the conceptual (top) diagram corresponds to the set of node-edge-node constructs with orange coloring or borders in the expanded (bottom) diagram.

Common Diagram Syntax

These colors and shapes are consistent across all diagrams in Figures 7-14.

  • Gold Rounded Rectangles: represent classes (objects).

  • Purple Rounded Rectangles: represent classes belonging to the Wikibase namespace.

  • Yellow Ellipses: represent datatypes.

  • Solid Arrow Heads: represent binary relations. If the target of the arrow is an ellipse, then it is a data property. Otherwise it is an object property.

  • Open Arrow Heads: represent a subclass relation.

  • Dashed Edge Lines: represent an instanceOf relation.

Condensed Diagram Syntax

These colors and shapes are for the upper diagrams in Figures 7-14.

  • Green Rectangles: correspond to a collapsed statement. It hides the underlying wikibase:Statement node and connecting properties.

  • Orange Rectangles: correspond to a collapsed qualifier.

  • Diamond Arrow Heads: represent a connection to a qualifier.

  • Blue Rectangles: correspond to a collapsed reference.

  • Circle Arrow Heads: represent a connection to a reference.

Expanded Diagram Syntax

These colors and shapes are for the lower diagrams in Figures 7-14.

  • Orange Colored Borders: indicate that the edge or rectangle was originally hidden by the corresponding collapsed label in the above diagram. Orange indicates that the collapsed label is regarded a Qualifier.

  • Blue Colored Borders: indicate that the edge or rectangle was originally hidden by the corresponding collapsed label in the above diagram. Blue indicates that the collapsed label is regarded a Reference.

  • Hash Nodes: represent instances which are automatically generated according to some hashing algorithm. In these diagrams there are two appearances, in the s: namespace, for wikibase:Statements, and the ref: namespace, for References on wikibase:Statement.

3.3 Axiomatization

The purpose of this section is to provide a reference axiomatization of the Wikibase model. This allows us to specify axioms over the conceptual diagrams, which we detail in Section 4. These axioms are provided in Description Logic, for a primer on this and the notation, please see Hitzler et al. (2010). Recall that propertyName, qualifierName, and referenceName are placeholders to improve the clarity of these diagrams. They are meant to be replaced when utilizing these patterns and their expansions. Indeed, when replacing one, the corresponding occurrences will all be replaced by the same predicate name, and they will then be distinguishable by their namespaces.

For example, we may replace wdt:propertyName with wdt:hasName and have the edge point to an xsd:string. At that point, we would replace all instances of “propertyName” with “hasName” across all namespaces in the diagram. We would then use Figure 14 and the accompanying axiomatization. Additional examples can be found in Section 5. Note that we have used the xsd: namespace, which stands for XML Schema Datatypes. These are a W3C standard way of representing data primitives Peterson et al. (2012).

Finally, recall that many of these structures can be directly seen in Figure 1.

In the following sections, we provide the description logic formulation of the axioms.121212Additional information on the syntax and construction can be found in Hitzler et al. (2010); Krötzsch et al. (2012). We have shortened some prefixes and predicates due to length:

  1. wikibase: has been shortened to wb:

  2. propertyName has been shortened to propName

  3. qualifierName has been shortened to qualName

  4. referenceName has been shortened to refName

  5. wasDerivedFrom has been shortened to wDF

Figure 7: The top diagram shows a condensed view of the Wikibase conceptual model. The bottom diagram shows how the expansion of the diagram to include the nodes that are automatically generated by the Wikibase framework. We use a diamond arrowhead and orange color to denote Qualifier nodes. A circle arrowhead and blue color to denote Reference nodes. In the expanded diagrams (in the lower portion), we change the border color of the nodes to indicate the origin of the generated nodes.
Figure 8: Condensed (top) and expanded (bottom) schema diagrams with an xsd:datetime for a qualifier. Nodes bordered in orange expand from the label node in the top diagram.
Figure 9: Condensed (top) and expanded (bottom) schema diagrams for modeling with xsd:decimal as a qualifier to wdt:propertyName.
Figure 10: Condensed (top) and expanded (bottom) schema diagrams for modeling with xsd:string as a qualifier for wdt:propertyName.
Figure 11: Condensed (top) and expanded (bottom) schema diagrams including a Reference for the assertions made by wdt:propertyName.
Figure 12: Condensed (top) and expanded (bottom) schema diagrams when modeling wdt:propertyName as a data property and the datatype is a xsd:datetime.
Figure 13: Condensed (top) and expanded (bottom) schema diagrams when modeling wdt:propertyName as a data property and the datatype is a xsd:decimal.
Figure 14: Condensed (top) and expanded (bottom) schema diagrams when modeling wdt:propertyName as a data property and the datatype is a xsd:string.
Figure 15: The reconstructed module for SexRecord from the Enslaved Ontology now using the Wikibase patterns.
Figure 16: The reconstructed module for NameRecord from the Enslaved Ontology now using the Wikibase patterns.
Figure 17: The reconstructed module for InteragentRelationshipRecord from the Enslaved Ontology now using the Wikibase patterns.
Figure 18: The reconstructed module for OccupationRecord from the Enslaved Ontology now using the Wikibase patterns.
Figure 19: The reconstructed module for ParticipatesInRecord from the Enslaved Ontology now using the Wikibase patterns.
Figure 20: The reconstructed module for AgeRecord from the Enslaved Ontology now using the Wikibase patterns.

3.3.1 Axioms Invariant to Exapnsion Type

In this section, we discuss those axioms that appear to be invariant under the expansion type, as seen in Figure 7.

(1)

Axiom 1 restricts the domain of p:propName to wb:Item.

(2)
(3)
(4)

Axiom 2 restricts the range of p:propName to wb:Statement. Recall that wb:Statement instances are reifications of the wdt:propName relation, and carry the qualifications and reference information about a particular triple (i.e., statements). Axiom 3 states that the inverse of p:propName is functional, (i.e., inverse functionality). Axiom 4 states that the inverse of p:propName is existential (i.e., inverse existential). Together, Axioms 3 and 4 indicate that a wb:Statement has exactly one inverse filler for p:propName. This may also be specified more succinctly using exact cardinality.

(5)

Axiom 5 restricts the domain of ps:propName to wb:Statement. Essentially, as a part of the reification of wdt:propName, ps:propName will always have an inverse filler of wb:Statement.

(6)
(7)

Axiom 6 restricts the range of ps:propName to wb:Item, acting as the other half of the reification for wdt:propName. Axiom 7 states that ps:propName has exactly one filler of type wb:Item, in the same manner as Axiom 4. These functionality and existential restrictions are necessary to mimic how and why these wb:Statement nodes are created. They uniquely connect the wdt:propName to the qualifier and reference information.

(8)

Axiom 8 restricts the domain of any pq:qualifierName to wb:Statement. In the Wikibase model, qualifiers are always attached to the “hash nodes.” We will discuss in the later sections specific to the type of qualifier the axioms pertaining to range, functionality, and, etc.

3.3.2 Axioms between Items

(9)

Axiom 9 is a role chain that formalizes the reification of wdt:propName. From the previous axioms, we know that the connecting filler for the reification is of type wb:Statement. Additionally, we can infer the following.

Note that the latter two axioms are written using the inverse form, as we wish to avoid having a role chain appear on the right hand side of the axiom, which cannot be specified in OWL 2.

3.3.3 Axioms Invariant for Qualifiers

Each type of qualifier that Wikibase supports (i.e., Date, Numeric, and String) have different structures. In this section, we only discuss those axioms that are invariant across these expansions. For reference, they pertain to the expansions as seen in Figures 8-10.

There are two ways to restrict the nature of a qualifier (i.e., the type) for a specific pq:qualName: scoped and un-scoped. Scoped, in this case, means that the type of the Item being qualified matters, whereas un-scoped is a global restriction on pq:qualName.

The scoped restriction follows, in rule format.

Note that in order to convert this into description logic, we will need to use a complex, inverse domain restriction here. Since a qualification occurs on a wb:Statement instance, we need to construct a role chain to the wb:Item and, as previously noted, we cannot have role chains on the right hand side of an axiom in OWL 2, we take the inverse of the range restriction, as seen in Axiom 10.

(10)

Specifying an un-scoped range restriction on pq:qualName is much simpler (Axiom 11).

(11)

We also note that the domain of pq:qualName is always a wb:Statement.

(12)

3.3.4 Axioms for Date Qualifiers

This section covers the axioms pertaining to the Wikibase model when the qualifier is a datetime. A graphical view of this structure is shown in Figure 8. This expansion creates a construction that incorporates metadata for the datetime, such as the timezone and the temporal reference system, which is done by creating a hash node in the v: namespace.

Axiom 13 indicates that the domain of pq:qualName is always wb:Statement.

(13)

Axiom 14 is the specific version of Axiom 10 (scoped range restriction) for the xsd:dateTime qualifier.

(14)

Axiom 15 is the specific version of Axiom 11 (un-scoped range restriction) for the xsd:dateTime qualifier. Based on modeling needs, one would choose only Axiom 14 or Axiom 15.

(15)

Axiom 16 indicates that the domain of pqv:qualName is restricted to wb:Statement.

(16)

Axiom 17 specifies that the range of pqv:qualName is restricted to wb:TimeValue.

(17)

In Axiom 18, we formalize the notion that wb:TimeValue instances are unique. That is, they are the filler for at most one pqv:qualName triple, and that it must qualify a wb:Statement.

(18)

We specify that the domain for each of the predicates (in the wikibase namespace, timeValue, timePrecision, timeTimezone and timeCalendarModel,is wb:timeValue (which appears as a hashed node in the v: namespace in Figure 8). This is specified in Axioms 19-22.

(19)
(20)
(21)
(22)

We may also specify their ranges, globally, in Axioms 23-26. We also indicate that they have an exact cardinality of one in Axioms 27-30.

(23)
(24)
(25)
(26)
(27)
(28)
(29)
(30)

Finally, it should be noted that whenever an assertion of pq:qualName exists, there is an associated “hash node” that accompanies it. We specify this in Axiom 31.

(31)

This ensures that the node carrying the additional contextual data is connected to the interesting property name, and connected back to the statement.

3.3.5 Axioms for String Qualifiers

In this section, we cover the axioms pertaining to the Wikibase model when the qualifier is a string. A graphical view of this structure is shown in Figure 10. Axiom 32 indicates that the domain of pq:qualName is always wb:Statement.

(32)

Axiom 33 is the specific version of Axiom 10 (scoped range restriction) for the xsd:dateTime qualifier.

(33)

Axiom 34 is the specific version of Axiom 11 (un-scoped range restriction) for the xsd:dateTime qualifier. Based on modeling needs, one would choose only Axiom 33 or Axiom 34.

(34)

3.3.6 Axioms for Numeric Value Qualifiers

Wikibase uses xsd:decimal with wikibase:QuantityUnit for this purpose. The pattern for using xsd:decimal as a qualifier can be found in Figure 9. Qualifying with a quantity creates a construction that incorporates the unit of the quantity, which is also a wb:Item. It does this by creating another hash node of type wb:QuantityValue, this time in the v: namespace. The hash node is also referenced with qualName, but in the pqv: namespace. pq:qualName points directly to the amount.

(35)
wb:Statement (36)
wb:Statement (37)

Axiom 35 is a domain restriction. That is, the inverse filler (i.e., the domain) is restricted to wb:Statement. Axiom 36 is a scoped range restriction. That is, when the inverse filler (i.e., the domain) is of type wb:Statement, the filler (i.e., the range) of pq:qualName is restricted to xsd:decimal. Axiom 37 states that pq:qualName is functional. That is, a particular wb:Statement node targets at most one xsd:decimal via pq:qualName.

(38)

Axiom 38 is a domain restriction. That is, the inverse filler (i.e., the domain) is restricted to wb:Statement.

wb:Statement (39)
wb:Statement (40)

Axiom 39 is a scoped range restriction. That is, when the inverse filler (i.e., the domain) is of type wb:Statement, the filler (i.e., the range) of pqv:qualName is restricted to wb:QuantityValue. Axiom 40 states that pqv:qualName is functional. That is, a particular wb:Statement node targets at most one wb:QuantityValue via pqv:qualName.

(41)
wb:QuantityValue (42)
wb:QuantityValue (43)
wb:QuantityValue (44)

Axiom 41 is a domain restriction. That is, the inverse filler (i.e., the domain) is restricted to wb:QuantityValue. Axiom 42 is a scoped range restriction. That is, when the inverse filler (i.e., the domain) is of type wb:QuantityValue, the filler (i.e., the range) of wb:quantityValue is restricted to xsd:decimal. Axiom 43 states that wb:quantityValue is existential. That is, there is always at least one filler. Axiom 44 states that wb:quantityValue is functional. That is, a particular wb:QuantityValue node targets at most one xsd:decimal via wb:quantityValue.

(45)
wb:QuantityValue (46)
wb:QuantityValue (47)
wb:QuantityValue (48)

Axiom 45 is a domain restriction. That is, the inverse filler (i.e., the domain) is restricted to wb:QuantityValue. Axiom 46 is a scoped range restriction. That is, when the inverse filler (i.e., the domain) is of type wb:QuantityValue, the filler (i.e., the range) of wb:quantityUnit is restricted to wb:Item. Axiom 47 states that wb:quantityUnit is existential. That is, there is always least one filler. Axiom 48 states that wb:quantityUnit is functional. That is, a particular wb:QuantityValue node targets at most one wb:Item via wb:quantityUnit.

3.3.7 Axioms for Statements

Axiom 13 is the specific version of Axiom 10 (scoped range restriction) for the xsd:dateTime qualifier.

(49)

Axiom 14 is the specific version of Axiom 11 (un-scoped range restriction) for the xsd:dateTime qualifier. Based on modeling needs, one would choose only Axiom 13 or Axiom 14.

(50)

3.3.8 Axioms for References

In this section,

We only specify scoped range and domain restrictions over prov:wasDerivedFrom, in order to avoid unwanted ontological commitments in the event that the PROV Ontology Sahoo et al. (2013) is used elsewhere in a developed ontology, as in Axioms 51 and 52.

(51)
wb:Statement (52)

It is always the case that the domain of pr:refName is a wb:Reference (Axiom 53).

(53)

In the same manner that we have specified the range of pq:qualName (i.e. Axioms 10 and 11), we will for pr:referenceName.

(54)
(55)

Furthermore, if we wish to vary the type of reference based on the domain of the statement assertion, we will need to construct an axiom similar to Axiom 54.

(56)

Finally, we specify that a particular reference node is uniquely the target of a Statement node. That is, prov:wasDerivedFrom is restricted by an inverse existential and inverse functionality axiom, which together form Axiom 57.

(57)

3.4 Resources

We have included a series of data shapes, for the purposes of validating triples materialized against the axioms described above. These are expressed in ShEx, the Shape Expression Language131313https://shex.io/, which is a structural schema language for RDF graphs. These are available online. These are expressed in ShEx as that seems to be the Wikidata community’s adopted way of validating data.

We have also provided two serializations for this manuscript: an OWL file containing the axiomatization of the Wikibase conceptual model and an OWL file containing the list of axiom patterns, as described in Section 4.

All of these resources are available online under Apache License 2.0.141414See https://gitlab.cs.ksu.edu/daselab/wikibase-ontology-design-library. We will provide a persistent URI in the final version of the manuscript.

4 Conceptual Modeling

As previously discussed, the overarching purpose of these patterns, and in particular the revised graphical syntax, is to simplify the discussion surrounding the Wikibase model. The next step is to improve our ability to conceptually reason about these patterns. That is, to be able to specify constraints and restrictions, such as mandating that statements of a certain type must always have a qualifier, again of a certain type.

As such, we have taken the axiom patterns from Eberhart et al. (2021) and, for each such axiom pattern, provided a natural language approximation alongside the axiom patterns modified to suit the Wikibase model. In this way, we can utilize the natural language to do top-level conceptual reasoning and then, when it is time to formalize the model, map these simple, natural language statements into the formal axioms. Our methodology is as follows.

Recall that a triple has the form “Subject Predicate Object”. One frequently encounters this via assertional statements, such as “ex:dog1 ex:hasName “Fido”^^xsd:string”. However, for the purposes of the following discussion, we will use Subject to refer, instead, to the Subject’s Type (i.e., moving up a layer of abstraction). In this way, we would say “ex:Dog ex:hasName xsd:string”, as it would appear as a node-edge-node construction in a schema diagram. In our axioms, we will use ex:Sub, ex:Pred, and ex:Obj, respectively.

In our natural language, we describe statements relative to the statement, and in particular the ex:Pred Statement. We will use about to denote the Subject of a Statement and refers to to denote the Object of the statement. For example, “A ex:hasName Statement is always refers to a ex:Name,” which we will see below as the natural language approximation of the Range axiom pattern.

We anticipate that these natural language approximations will be used in conjunction with the formal model included above in Section 4. As such, there may be some logical redundancies when the formal model is combined with the suggested axioms below. We do not consider this to be problematic and, indeed, believe that the inclusion of both can aid human understanding.

Recall that

(58)

Domain: “A Predicate Statement is always about a Subject.”

(59)
(60)

Range: “A Predicate Statement always refers to an Object.”

(61)
(62)

Scoped Domain: “A Predicate Statement that refers to an Object, is always about a Subject.”

(63)
(64)

Axiom 63 is one such logically redundant axiom: Axioms 5, 7, and 58 together infer it. This fact is actually quite useful, as it means we may specify restrictions on wdt:propName and it remains consistent over the formal model.

Scoped Range: “A Predicate Statement that is about a Subject always refers to an Object.”

ex:Sub (65)

In line with Axiom 63, we should also have an axiom

but, it cannot be expressed directly in OWL. However, with Axioms 58 and 65, we can infer it.

Functionality: “A Predicate Statement refers to at most one Item.”

(66)
(67)

Note that there are variations of functionality, which we would call qualified and scoped based on whether or not the Predicate Statement is always about certain Subjects or refers to only certain Objects; these are described below.

Inverse Functionality: “A Predicate Statement is about at most one Item.”

(68)
(69)

Scoped Functionality: “A Predicate Statement is about at most one Subject.”

ex:Sub (70)
ex:Sub (71)

Qualified Functionality: “A Predicate Statement refers to at most one Object.”

(72)
(73)
(74)

We need to include the third axiom which scopes the global functionality statement for ps:propName.

Qualified Scoped Functionality: “A Predicate Statement about a Subject refers to at most one Object.”

ex:Sub (75)
ex:Sub (76)
ex:Sub (77)

We need to include the third axiom which scopes the global functionality statement for ps:propName.

Inverse Qualified Scoped Functionality: “A Predicate Statement that refers to an Object is about at most one Subject.”

ex:Obj (78)
ex:Obj (79)
wb:Statement (80)

Existential: “A Predicate Statement refers to at least one Object.”

ex:Sub (81)
ex:Sub (82)

Inverse Existential: “A Predicate Statement is about at least one Subject.”

ex:Obj (83)
ex:Obj (84)

Note that Inverse Existential axioms will only work when in conjunction with a domain restriction axiom. In essence, p:propName is inverse functional, but because we do not have any control over wb:Statement, we cannot state that p:propName is also inverse existential, as that would interfere every other wb:Statement. As such, we can state that ps:propName is inverse existential, which means that there exists a wb:Statement node, and we can then assume the existence of something that points at that node. Yet we cannot simultaneously dictate the type of that node. However, if and only if we have domain restriction axiom for p:propName, we can together with the other axioms approximate Inverse Existentiality.

5 Case Study: The Enslaved Ontology

Figure 21: The original Enslaved Ontology
Figure 22: A reconstruction of some of the modules from the Enslaved Ontology as mapped into the Wikibase model. Recall that purple rounded rectangles indicate that that class is controlled, i.e., as in a controlled vocabulary.
Figure 23: A reconstruction of some of the modules from the Enslaved Ontology using the Wikibase patterns and modified graphical syntax. Recall that purple rounded rectangles indicate that that class is controlled, i.e., as in a controlled vocabulary.

5.1 The Enslaved Ontology

The Enslaved Ontology serves as the underlying schema and data organization paradigm for the Enslaved Hub. It is not used for reasoning or inference, but as a guide for organizing and integrating the data, and understanding the knowledge base as a whole. As previously discussed, the Enslaved Ontology was developed using a nascent version of the MOMo Methodology and, furthermore, before the decision to use Wikibase as the underlying implementation and infrastructure for serving the data.

The work described herein is a result of the mismatches between the original Enslaved Ontology (whose schema diagram is shown in Figure 21) and how Wikibase stores information.

We have reconstructed the Enslaved Ontology using our modified graphical syntax, resulting in Figure 23. The individual modules appear in Figures 15 through 20. At this time, we have only reconstructed some of the modules, with the remaining modules relegated to future work.

6 Related Work

Cidoc-Crm

The CIDOC conceptual reference model (CIDOC-CRM) is an informational model for representing cultural information Doerr (2003). As mentioned in Section 1, we are particularly interested in persistence of data, but also facilitating robust deployment of interacting with the data. While CIDOC-CRM is a domain-standard way of annotating data, which improves the interoperability of the data with other similarly described cultural data, we would yet need to design a system capable of serving that data. It is for this reason we initially chose to align to the Wikibase model. Creating conceptual patterns between Wikibase and CIDOC-CRM is potential future work.

Ottr

Reasonable Ontology Design Templates (OTTR) Skjæveland et al. (2018) is a methodology for designing templates for concepts. A tutorial can be found online.151515https://ottr.xyz/ In particular, it allows for the schema developer to design a base level template for a certain concept, which can then be easily and programmatically expanded into the appropriate axiomatization. However, it cannot currently instantiate from the property graph formulation (our conceptual diagrams) to the expansions. Extending OTTR to work this way, or identifying a sufficiently capable workaround, is also potential future work.

Property Graphs

Property graphs Hartig (2017) allow for the specification of predicates as “first-class citizens.” While this is a natural way of attaching qualifiers and references to the wdt:propertyNames, it is not currently possible to specify the more interesting axioms (such as domain and range restrictions) in OWL over such graphs. We look forward to evaluating how RDF* and SPARQL* will be formalized from the upcoming W3C working group, which may provide an additional way of modeling such data. It still remains to be seen how semantics may be expressed over such structures.

Open Data to Wikidata

In Faiz et al. (2019), the authors take a pattern-based approach to semi-automatically populating Wikidata from open (tabular) data. This is similar in purpose to our work: persist data in a transparent manner and utilize the Wikibase model. However, it significantly departs from our work; foremost is that it is an automatic framework that creates a naive schema from tabular data and attempts to match these entities, and subsequent instance data, to existing entities in Wikidata. This is a departure from our approach which is to create a conceptual pattern library for the development of rich schemas.

7 Conclusion

When developing and deploying a knowledge graph, there are many obstacles to a persistent, transparent, and usable resource. One way to overcome these obstacles is to use the Wikibase framework. In this paper, we have represented several common modeling constructions in a graphical syntax that makes it clear how they map into the Wikibase context. This should allow ontology developers to more quickly, accurately, and with reduced effort create ontologies (or knowledge graph schema) that are “Wikibase ready,” thus improving persistence and accessibility of the deployed knowledge graph.

Future Work

There is certainly additional work to be accomplished in this direction. In particular, we see the following as immediate next steps to take.

  • Identification of frequent CIDOC Doerr (2003) patterns and corresponding translation into Wikibase patterns.

  • Extension of OTTR Skjæveland et al. (2018) to allow for instantiations from the conceptual diagrams.

  • Develop a robust or extend a tooling system (e.g., CoModIDE Shimizu et al. (2021) for directly using these “Wikibase-ified” axiom patterns.

  • Create a MODL Shimizu et al. (2019b) of Wikibase-compatible patterns (e.g., by taking each pattern in MODL 1.0 and translating them using the axiom patterns above).

Acknowledgement. The authors acknowledge support by the National Science Foundation under Grant 2032628 EAGER: Open Science in Semantic Web Research and Grant 2033521 A1: KnowWhereGraph: Enriching and Linking Cross-Domain Knowledge Graphs using Spatially-Explicit AI Technologies, as well as the Mellon Foundation through the Enslaved: Peoples of the Historical Slave Trade.

References

  • [1] M. Doerr (2003) The CIDOC conceptual reference module: an ontological approach to semantic interoperability of metadata. AI Mag. 24 (3), pp. 75–92. External Links: Link, Document Cited by: §6, 1st item.
  • [2] A. Eberhart, C. Shimizu, S. Chowdhury, Md. K. Sarker, and P. Hitzler (2021) Expressibility of OWL axioms with patterns. In The Semantic Web - 18th International Conference, ESWC 2021, Virtual Event, June 6-10, 2021, Proceedings, R. Verborgh, K. Hose, H. Paulheim, P. Champin, M. Maleshkova, Ó. Corcho, P. Ristoski, and M. Alam (Eds.), Lecture Notes in Computer Science, Vol. 12731, pp. 230–245. External Links: Link, Document Cited by: §3.1, §4.
  • [3] A. Eells, L. Zhou, C. Shimizu, P. Hitzler, S. G. Estrecha, and D. Rehberger (2021) Aligning patterns to the wikibase model. In Proceedings of the 12th Workshop on Ontology Design and Patterns, WOP 2021 at the 20th International Semantic Web Conference (ISWC 2021), October 2021, Note: Available from: https://daselab.cs.ksu.edu/publications/aligning-patterns-wikibase-model Cited by: §1.
  • [4] F. Erxleben, M. Günther, M. Krötzsch, J. Mendez, and D. Vrandecic (2014) Introducing wikidata to the linked data web. In The Semantic Web – ISWC 2014 – 13th International Semantic Web Conference, Riva del Garda, Italy, October 19-23, 2014. Proceedings, Part I, P. Mika, T. Tudorache, A. Bernstein, C. Welty, C. A. Knoblock, D. Vrandecic, P. Groth, N. F. Noy, K. Janowicz, and C. A. Goble (Eds.), Lecture Notes in Computer Science, Vol. 8796, pp. 50–65. External Links: Document Cited by: footnote 6.
  • [5] M. Faiz, G. M. F. Wisesa, A. A. Krisnadhi, and F. Darari (2019) OD2WD: from open data to wikidata through patterns. In Proceedings of the 10th Workshop on Ontology Design and Patterns (WOP 2019) co-located with 18th International Semantic Web Conference (ISWC 2019), Auckland, New Zealand, October 27, 2019, K. Janowicz, A. A. Krisnadhi, M. Poveda-Villalón, K. Hammar, and C. Shimizu (Eds.), CEUR Workshop Proceedings, Vol. 2459, pp. 2–16. External Links: Link Cited by: §6.
  • [6] A. Gangemi and V. Presutti (2009) Ontology design patterns. In Handbook on Ontologies, S. Staab and R. Studer (Eds.), International Handbooks on Information Systems, pp. 221–243. External Links: Link, Document Cited by: §1.
  • [7] K. Hammar and V. Presutti (2016) Template-based content ODP instantiation. In Advances in Ontology Design and Patterns [revised and extended versions of the papers presented at the 7th edition of the Workshop on Ontology and Semantic Web Patterns, WOP@ISWC 2016, Kobe, Japan, 18th October 2016], K. Hammar, P. Hitzler, A. Krisnadhi, A. Lawrynowicz, A. G. Nuzzolese, and M. Solanki (Eds.), Studies on the Semantic Web, Vol. 32, pp. 1–13. External Links: Link, Document Cited by: §1.
  • [8] O. Hartig (2017) Foundations of rdf and sparql (an alternative approach to statement-level metadata in RDF). In Proceedings of the 11th Alberto Mendelzon International Workshop on Foundations of Data Management and the Web, Montevideo, Uruguay, June 7-9, 2017, J. L. Reutter and D. Srivastava (Eds.), CEUR Workshop Proceedings, Vol. 1912. External Links: Link Cited by: §2.2, §6.
  • [9] P. Hitzler, M. Krötzsch, and S. Rudolph (2010) Foundations of semantic web technologies. Chapman and Hall/CRC Press. Cited by: §3.1, §3.3, footnote 12.
  • [10] M. Krötzsch, F. Simancik, and I. Horrocks (2012) A description logic primer. CoRR abs/1201.4089. External Links: Link, 1201.4089 Cited by: footnote 12.
  • [11] D. Peterson, A. Malhotra, S. Gao, P. V. Biron, H. Thompson, and M. Sperberg-McQueen (2012-04) W3C xml schema definition language (XSD) 1.1 part 2: datatypes. W3C Recommendation W3C. Note: https://www.w3.org/TR/2012/REC-xmlschema11-2-20120405/ Cited by: §3.3.
  • [12] V. Presutti, E. Daga, A. Gangemi, and E. Blomqvist (2009) EXtreme Design with content ontology design patterns. In Proceedings of the Workshop on Ontology Patterns (WOP 2009), collocated with the 8th International Semantic Web Conference (ISWC-2009), Washington D.C., USA, 25 October, 2009., E. Blomqvist, K. Sandkuhl, F. Scharffe, and V. Svátek (Eds.), CEUR Workshop Proceedings, Vol. 516. External Links: Link Cited by: footnote 1.
  • [13] S. Sahoo, D. McGuinness, and T. Lebo (2013-04) PROV-o: the PROV ontology. W3C Recommendation W3C. Note: http://www.w3.org/TR/2013/REC-prov-o-20130430/ Cited by: §3.3.8.
  • [14] C. Shimizu, A. Eberhart, N. Karima, Q. Hirt, A. Krisnadi, and P. Hitzler (2019) A method for automatically generating schema diagrams for modular ontologies. In 1st Iberoamerican Conference on Knowledge Graphs and the Semantic Web, Note: To Appear. Cited by: §3.1.
  • [15] C. Shimizu, K. Hammar, and P. Hitzler (2021) Modular ontology modeling. Semantic Web. Note: In Press. Cited by: §1, §1, §3.1, 3rd item.
  • [16] C. Shimizu, Q. Hirt, and P. Hitzler (2019) MODL: A modular ontology design library. In Proceedings of the 10th Workshop on Ontology Design and Patterns (WOP 2019) co-located with 18th International Semantic Web Conference (ISWC 2019), Auckland, New Zealand, October 27, 2019, K. Janowicz, A. A. Krisnadhi, M. Poveda-Villalón, K. Hammar, and C. Shimizu (Eds.), CEUR Workshop Proceedings, Vol. 2459, pp. 47–58. External Links: Link Cited by: §1, §3, 4th item.
  • [17] C. Shimizu, P. Hitzler, Q. Hirt, D. Rehberger, S. G. Estrecha, C. Foley, A. M. Sheill, W. Hawthorne, J. Mixter, E. Watrall, R. Carty, and D. Tarr (2020) The Enslaved ontology: peoples of the historic slave trade. J. Web Semant. 63, pp. 100567. External Links: Link, Document Cited by: §1.
  • [18] M. G. Skjæveland, D. P. Lupp, L. H. Karlsen, and H. Forssell (2018) Practical ontology pattern instantiation, discovery, and maintenance with reasonable ontology templates. In The Semantic Web – ISWC 2018 – 17th International Semantic Web Conference, Monterey, CA, USA, October 8-12, 2018, Proceedings, Part I, D. Vrandecic, K. Bontcheva, M. C. Suárez-Figueroa, V. Presutti, I. Celino, M. Sabou, L. Kaffee, and E. Simperl (Eds.), Lecture Notes in Computer Science, Vol. 11136, pp. 477–494. External Links: Link, Document Cited by: §6, 2nd item.
  • [19] M. D. Wilkinson, M. Dumontier, et al. (2016-03-15) The FAIR guiding principles for scientific data management and stewardship. Scientific Data 3, pp. 160018. External Links: Link Cited by: §1.
  • [20] L. Zhou, C. Shimizu, P. Hitzler, A. M. Sheill, S. G. Estrecha, C. Foley, D. Tarr, and D. Rehberger (2020) The Enslaved dataset: A real-world complex ontology alignment benchmark using wikibase. In CIKM ’20: The 29th ACM International Conference on Information and Knowledge Management, Virtual Event, Ireland, October 19-23, 2020, M. d’Aquin, S. Dietze, C. Hauff, E. Curry, and P. Cudré-Mauroux (Eds.), pp. 3197–3204. External Links: Link, Document Cited by: §1.