Direct Mappings between RDF and Property Graph Databases

12/04/2019 ∙ by Harsh Thakkar, et al. ∙ 0

Resource Description Framework (RDF) triplestores and Property Graph (PG) database systems are two approaches for data management that are based on modeling, storing and querying graph-like data. Given the heterogeneity between these systems, it becomes necessary to develop methods to allow interoperability among them. While there exist some approaches to exchange data and schema between RDF and PG databases, they lack compatibility and even a solid formal foundation. In this paper, we study the semantic interoperability between RDF and PG databases. Specifically, we present two direct mappings (schema-dependent and schema-independent) for transforming an RDF database into a PG database. We show that the proposed mappings possess the fundamental properties of semantics preservation and information preservation. The existence of both mappings allows us to conclude that the PG data model subsumes the expressiveness or information capacity of the RDF data model.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

RDF [10008] and Graph databases [70101] are two approaches for data management that are based on modeling, storing and querying graph-like data. The database systems based on these models are gaining relevance in the industry due to their use in various application domains where complex data analytics is required [91531].

RDF triplestores and graph database systems are tightly connected as they are based on graph data models. RDF databases are based on the RDF data model [10008], their standard query language is SPARQL [sparql11], and RDF Schema [rdfschema] allows to describe classes of resources and properties (i.e. the data schema). On the other hand, most graph databases are based on the Property Graph (PG) data model, there is no standard query language, and there is no standard notion of property graph schema [91372]. Therefore, RDF and PG database systems are dissimilar in data model, schema constraints and query language.

Motivation. The term “Interoperability” was introduced in the area of information systems, and is defined as the ability of two or more systems or components to exchange information, and to use the information that has been exchanged [91491]. In the context of data management, interoperability is concerned with the support of applications that share and exchange information across the boundaries of existing databases [91396].

Databases interoperability is relevant for several reasons, including: promotes data exchange and data integration [91389]; facilitates the reuse of available systems and tools [DBLP:journals/corr/abs-1910-03118, 91396]; enables a fair comparison of database systems by using benchmarks [91027, DBLP:conf/esws/Thakkar17, DBLP:conf/i-semantics/ThakkarKDLA17]; and supports the success of emergent systems and technologies [91396].

Given the heterogeneity between RDF triplestores and graph database systems, and considering their graph-based data models, it becomes necessary to develop methods to allow interoperability among these systems.

The Problem. To the best of our knowledge, the research about the interoperability between RDF and PG databases is very restricted (cf. Section 5). While there exist some system-specific approaches, most of them are restricted to data transformation and lack of solid formal foundations.

Objectives & Contributions. Database interoperability can be divided into syntactic interoperability (i.e. data format transformation), semantic interoperability (i.e. data exchange via schema and instance mappings) and query interoperability (i.e. transformations among different query languages or data accessing methods) [AnglesAMW19].

The main objective of this paper is to study the semantic interoperability between RDF and PG databases. Specifically, we propose two mappings to translate RDF databases into PG databases. We study two desirable properties of these database mappings, named semantics preservation and information preservation. Based on such database mappings, we conclude that the PG data model subsumes the information capacity of the RDF data model.

The remainder of this paper is as follows: A formal background is presented in Section 2, including definitions related to database mappings, RDF databases, and Property Graph databases; A schema-dependent database mapping, to transform RDF databases into PG databases, is presented in Section 3; A schema-independent database mapping is presented in Section 4; The related work is presented in Section 5; Our conclusions are presented in Section 6.

2 Preliminaries

This section presents a formal background to study the interoperability between RDF and PG databases. In particular, we formalize the notions of database mapping, RDF database, and property graph database.

2.1 Database mappings

In general terms, a database mapping is a method to translate databases from a source database model to a target database model. We can consider two types of database mappings: direct database mappings, which allow an automatic translation of databases without any input from the user [91025]; and manual database mappings, which require additional information (e.g. an ontology) to conduct the database translation. In this paper, we focus on direct database mappings.

Database schema and instance

Let be a database model. A database schema in is a set of semantic constraints allowed by . A database instance in is a collection of data represented according to . A database in

is an ordered pair

, where is a schema and is an instance.

Note that the above definition does not establish that the database instance satisfies the constraints defined by the database schema. Given a database instance and a database schema , we say that is valid with respect to , denoted , iff satisfies the constraints defined by . Given a database , we say that is a valid database iff it satisfies that .

Schema, instance, and database mapping

A database mapping defines a way to translate databases from a “target” database model to a “source” database model. For the rest of this section, assume that and are the source and the target database models respectively.

Considering that a database includes a schema and an instance, we first define the notions of schema mapping and instance mapping. A schema mapping from to is a function from the set of all database schemas in , to the set of all database schemas in . Similarly, an instance mapping from to is a function from the set of all database instances in , to the set of all database instances in .

A database mapping from to is a function from the set of all databases in , to the set of all databases in . Specifically, a database mapping is defined as the combination of a schema mapping and an instance mapping.

Definition 1 (Database Mapping)

A database mapping is a pair where is a schema mapping and is an instance mapping.

2.1.1 Properties of database mappings

Every data model allows to structure the data in a specific way, or using a particular abstraction. Such abstraction determines the conceptual elements that the data model can represent, i.e. its representation power or information capacity [50604].

Given two database models and , the possibility to exchange databases between them depends on their information capacity. Specifically, we say that subsumes the information capacity of iff every database in can be translated to a database in . Additionally, we say that and have the same information capacity iff subsumes and subsumes .

The information capacity of two database models can be evaluated in terms of a database mapping satisfying some properties. In particular, we consider three properties: computability, semantics preservation, and information preservation.

Assume that is the set of all databases in a source database model , and is the set of all databases in a target database model .

Definition 2 (Computable mapping)

A database mapping is computable if there exists an algorithm that, given a database , computes .

The property of computability indicates the existence and feasibility of implementing a database mapping from to . This property also implicates that subsumes the information capacity of .

Definition 3 (Semantics preservation)

A computable database mapping is semantics preserving if for every valid database , there is a valid database satisfying that .

Semantics preservation indicates that the output of a database mapping is always a valid database. Specifically, the output database instance satisfies the constraints defined by the output database schema. In this sense, we can say that this property evaluates the correctness of a database mapping.

Definition 4 (Information preservation)

A database mapping from to is information preserving if there is a computable database mapping from to such that for every database in , it applies that .

Information preservation indicates that, for some database mapping , there exists an “inverse” database mapping which allows to recover a database transformed with . Note that the above definition implies the existence of both a “inverse” schema mapping and a “inverse” instance mapping .

Information preservation is a fundamental property because it guarantees that a database mapping does not lose information [91025]. Moreover, it implies that the information capacity of the target database model subsumes the information capacity of the source database model.

Our goal is to define database mappings between the RDF data model and the Property Graph data model. Hence, next, we will present a formal definition of the notions of instance, schema, and database for them.

2.2 RDF Databases

An RDF database is an approach for data management which is oriented to describe the information about Web resources by using Web models and languages. In this section we describe two fundamental standards used by RDF databases: the Resource Description Framework (RDF) [10008], which is the standard data model to describe the data; and RDF Schema [rdfschema], which is a standard vocabulary to describe the structure of the data.

2.2.1 RDF Graph.

Assume that I and L are two disjoint infinite sets, called IRIs and Literals respectively. IRIs are used as web resource identifiers and Literals are used as values (e.g. strings, numbers or dates). In addition to IRIs and Literals, the RDF data model considers a domain of anonymous resources called Blank Nodes. Based on the work of Hogan et al. [91033], we avoid the use of Blank Nodes as that their absence does not affect the results presented in this paper. Moreover, we can obtain similar results by replacing Blank Nodes with IRIs (via Skolemisation [91033]).

An RDF triple is a tuple where is called the subject, is the predicate and is the object. Here, the subject represents a resource (identified by an IRI), the predicate represents a relationship of the resource (identified by an IRI), and the object represents the value of such relationship (which is either an IRI or a literal).

Let be a set of RDF triples. We use , and to denote the sets of subjects, predicates, and objects in respectively. There are different data formats to encode a set of RDF triples. The following example shows an RDF description encoded using the Turtle data format [10121].

Example 2.1
1@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
2@prefix xsd: <http://www.w3.org/2001/XMLSchema#>
3@prefix voc: <http://www.example.org/voc/> .
4@prefix ex: <http://www.example.org/data/> .
5ex:Tesla_Inc rdf:type voc:Organisation ;
6             voc:name "Tesla, Inc." ;
7             voc:creation "2003-07-01"^^xsd:date .
8ex:Elon_Musk rdf:type voc:Person ;
9             voc:birthName "Elon Musk" ;
10             voc:age "46"^^xsd:int ;
11             voc:ceo ex:Tesla_Inc .

The lines beginning with @prefix are prefix definitions and the rest are RDF triples. A prefix definition associates a prefix (e.g. voc) with an IRI (e.g. http://www.example.org/voc/). Hence, a full IRI like
http://www.example.org/voc/Person can be abbreviated as a prefixed named voc:Person. We will use and to extract the prefix and the name of an IRI respectively. We will consider two types of literals: a literal which consists of a string and a datatype IRI (e.g. "46"^^xsd:int), and a literal which is a Unicode string (e.g. "Elon Musk"), which is a synonym for a xsd:string literal (e.g. "Elon Musk"^^xsd:string).

A set of RDF triples can be visualized as a graph where the nodes represent the resources, and the edges represent properties and values. However, the RDF model has a special feature: an IRI can be used as an object and predicate in an RDF graph. For instance, the triple (voc:ceo, rdfs:label, "Chief Executive Officer") can be added to the graph shown in Example 2.1 to include metadata about the property voc:ceo. It implies that an RDF graph is not a traditional graph because it allows edges between edges, and consequently an RDF graph cannot be visualized in a traditional way. Next, we present a formal definition of the RDF data model which supports a traditional graph-based representation.

Definition 5 (RDF Graph)

An RDF graph is defined as a tuple where:

  • is a finite set of nodes representing RDF resources (i.e. resource nodes);

  • is a finite set of nodes representing RDF literals (i.e. literal nodes), satisfying that ;

  • is a finite set of edges called object property edges;

  • is a finite set of edges called datatype property edges, satisfying that 222We borrowed the names from Web Ontology Language.;

  • is a total one-to-one function that associates each resource node with a single IRI;

  • is a total one-to-one function that associates each literal node with a single literal;

  • is a total function that associates each object property edge with a pair of resource nodes;

  • is a total function that associates each datatype property edge with a resource node and a literal node;

  • : is a partial function that assigns a resource class label to each node or edge.

Note that the function has been defined as being partial to support a partial connection between schema and data (which is usual in real RDF datasets). Concerning the issue about an IRI occurring as both resource and property, note that will occur as resource and property separately. In such a case, we will have a bipartite graph.

In order to facilitate the transformation of RDF data to Property Graphs, we will assume that every node or edge in an RDF graph defines a resource class. This assumption is shown by the following procedure which allows transforming a set of RDF triples into a formal RDF graph.

The procedure to create an RDF Graph from a set of RDF triples is defined as follows:

  • For every resource , there is a node with ;

    • If then , else ;

  • For every literal , there is a node ;

    • If is a literal of the form value then and ;

    • If is a literal of the form value^^datatype then = value and = datatype;

  • For every triple where , there is an edge with and , such that and ;

  • For every triple where , there is an edge with and , such that and .

Hence, the RDF graph obtained from the set of RDF triples shown in Example 2.1 is given as follows:

1, ,
2, ,
3, ,
4, , , ,
5,
6, , , ,
7, , , , , ,
8, , , , .

Additionally, Figure 1 shows a graphical representation of the RDF graph described above. The resource nodes are represented as ellipses and literal nodes are presented as rectangles. Each node is labeled with two IRIs: the inner IRI indicates the resource identifier, and the outer IRI indicates a resource class. Each edge is labeled with an IRI that indicates its property class.

Figure 1: Graphical illustration of an RDF graph.

2.2.2 RDF Graph Schema.

RDF Schema (RDFS) [rdfschema] defines a standard vocabulary (i.e., a set of terms, each having a well-defined meaning) which enables the description of resource classes and property classes. From a database perspective, RDF Schema can be used to describe the structure of the data in an RDF database.

In order to describe classes of resources and properties, the RDF Schema vocabulary defines the following terms: rdfs:Class and rdf:Property represent the classes of resources, and properties respectively; rdf:type can be used (as property) to state that a resource is an instance of a class; rdfs:domain and rdfs:range allow to define the domain and range of a property, respectively. Note that rdf: and rdfs: are the prefixes for RDF and RDFS respectively.

An RDF Schema is described using RDF triples, so it can be encoded using RDF data formats. The following example shows an RDF Schema which describes the structure of the data shown in Example 2.1, using the Turtle data format.

Example 2.2
1@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
2@prefix xsd: <http://www.w3.org/2001/XMLSchema#>
3@prefix voc: <http://www.example.org/voc/> .
4voc:Organisation rdf:type rdfs:Class .
5voc:name rdf:type rdf:Property ;
6         rdfs:domain voc:Organisation ;
7         rdfs:range xsd:string .
8voc:creation rdf:type rdf:Property ;
9             rdfs:domain voc:Organisation ;
10             rdfs:range xsd:date .
11voc:Person rdf:type rdfs:Class .
12voc:birthName rdf:type rdf:Property ;
13              rdfs:domain voc:Person ;
14              rdfs:range xsd:string .
15voc:age rdf:type rdf:Property ;
16        rdfs:domain voc:Person ;
17        rdfs:range xsd:int .
18voc:ceo rdf:type rdf:Property ;
19        rdfs:domain voc:Organisation ;
20        rdfs:range voc:Person .

Note that a resource class is defined by a triple of the form ( rdf:type rdfs:Class). A property class is defined (ideally) by three triples of the form ( rdf:type rdf:Property), ( rdfs:domain ) and ( rdfs:range ), where indicates the resource class having property (i.e. the domain), and indicates the resource class determining the value of the property (i.e. the range).

If the range of a property class is a resource class (defined by the user), then is called an object property (e.g. voc:birthName). If the range is a datatype class (defined by RDF Schema or another vocabulary), then is called a datatype property. The IRIs xsd:string, xsd:integer and xsd:dateTime are examples of datatypes defined by XML Schema [biron2012]. Let be the set of RDF datatypes.

Note that the RDF schema presented in Example 2.2 provides a complete description of resource classes and property classes. However, in practice, it is possible to find incomplete or partial RDF schema descriptions. In particular, a property could not define its domain or its range.

We will assume that a partial schema can be transformed into a total schema. In this sense, we will use the term rdfs:Resource333According to the RDF Schema specification [rdfschema], rdfs:Resource denotes the class of everything. to complete the definition of properties without domain or range. For instance, suppose that our sample RDF Schema does not define the range of the property class voc:ceo. In this case, we include the triple (voc:ceo, rdfs:range, rdfs:Resource) to complete the definition of voc:ceo.

Now, we introduce the notion of RDF Graph Schema as a formal way to represent an RDF schema description. Assume that is a set that includes the RDF Schema terms rdf:type, rdfs:Class, rdfs:Property, rdfs:domain and rdfs:range.

Definition 6 (RDF Graph Schema)

An RDF graph schema is defined as a tuple where:

  • is a finite set of nodes representing resource classes;

  • is a finite set of edges representing property classes;

  • is a total function that associates each node or edge with an IRI representing a class identifier;

  • : is a total function that associates each property class with a pair of resource classes.

Recall that denotes the set of RDF datatypes. Given an RDF Schema description , the procedure to create and RDF Graph schema from is given as follows:

  1. Let

  2. For each , we create with

  3. For each pair of triples and in , we create with and , satisfying that , and .

Following the above procedure, the RDF schema shown in Example 2.2 can be formally described as follows:

1,
2;
3, , , , 
4,  , , , ,
5, , , , .

Additionally, Figure 2 shows a graphical representation of the RDF schema graph described above.

Figure 2: Graphical illustration of an RDF graph schema.

Given an RDF graph schema and an RDF graph , we say that is valid with respect to , denoted as , iff:

  1. for each , it applies that there is where ;

  2. for each with , it applies that there is where , , and .

  3. for each with , it applies that there is where , , and .

Here, condition (1) validates that every resource node is labeled with a resource class defined by the schema; condition (2) verifies that each object property edge, and the pairs of resource nodes that it connects, are labeled with the corresponding resource classes; and condition (3) verifies that each datatype property edge, and the pairs of nodes that it connects (i.e. a resource node and a literal node), are labeled with the corresponding resource classes

Finally, we present the notion of RDF database.

Definition 7 (RDF Database)

An RDF database is a pair where is an RDF graph schema and is an RDF graph satisfying that .

2.3 Property Graph Databases

A Property Graph (PG) is a labeled directed multigraph whose main characteristic is that nodes and edges can contain a set (possibly empty) of name-value pairs referred to as properties. From the point of view of data modeling, each node represents an entity, each edge represents a relationship (between two entities), and each property represents a specific characteristic (of an entity or a relationship).

Figure 3: Graphical illustration of a Property Graph.

Figure 3 presents a graphical representation of a Property Graph. The circles represent nodes, the arrows represent edges, and the boxes contain the properties for nodes and edges.

Currently, there are no standard definitions for the notions of Property Graph and Property Graph Schema. However, we present formal definitions that resemble most of the features supported by current PG database systems.

2.3.1 Property Graph.

Assume that is an infinite set of labels (for nodes, edges and properties), is an infinite set of (atomic or complex) values, and is a finite set of data types (e.g. string, integer, date, etc.). A value in will be distinguished as a quoted string. Given a value , the function returns the datatype of . Given a set , denotes the set of non-empty subsets of .

Definition 8 (Property Graph)

A Property Graph is defined as a tuple where:

  • is a finite set of nodes, is a finite set of edges, is a finite set of properties, and are mutually disjoint sets;

  • is a total function that associates each node or edge with a label;

  • is a total function that assigns a label-value pair to each property.

  • is a total function that associates each edge with a pair of nodes;

  • is a partial function that associates a node or edge with a non-empty set of properties, satisfying that for each pair of objects ;

The above definition supports Property Graphs with the following features: a pair of nodes can have zero or more edges; each node or edge has a single label; each node or edge can have zero or more properties; and a node or edge can have the same label-value pair one or more times.

On the other side, the above definition does not support multiple labels for nodes or edges. We have two reasons to justify this restriction. First, this feature is not supported by all graph database systems. Second, it makes complex the definition of schema-instance consistency.

Given two nodes and an edge , satisfying that , we will use as a shorthand representation for , where and are called the “source node” and the “target node” of respectively.

Hence, the formal description of the Property Graph presented in Figure 3 is given as follows:

1,
2,
3,
4, ,
5,
6, , , , 
7,
8, , .

2.3.2 Property Graph Schema.

A Property Graph Schema defines the structure of a PG database. Specifically, it defines types of nodes, types of edges, and the properties for such types.

Figure 4: Graphical illustration of a Property Graph Schema.

For instance, Figure 4 shows a graphical representation of a PG schema. The formal definition of PG schema is presented next.

Definition 9 (Property Graph Schema)

A Property Graph Schema is defined as a tuple where:

  • is a finite set of node types;

  • is a finite set of edge types;

  • is a finite set of property types;

  • is a total function that assigns a label to each node or edge;

  • is a total function that associates each property type with a property label and a data type;

  • is a total function that associates each edge type with a pair of node types;

  • is a partial function that associates a node or edge type with a non-empty set of property types, satisfying that , for each pair of objects .

Hence, the formal description of the Property Graph Schema shown in Figure 4 is the following:

1,
2,
3,
4, , ,
5, , , , ,
6,
7, , 

Given a PG schema and a PG , we say that is valid with respect to , denoted as , iff:

  1. for each , it applies that there is satisfying that:

    • ;

    • for each , there is satisfying that and .

  2. for each , it applies that there is with satisfying that:

    • , , ;

    • for each , there is satisfying that and .

Here, condition (1a) validates that every node is labeled with a node type defined by the schema; condition (1b) verifies that each node contains the properties defined by its node type; condition (2a) verifies that each edge, and the pairs of nodes that it connects, are labeled with an edge type, and the corresponding node types; and condition (2b) verifies that each edge contains the properties defined by the schema.

Finally, we present the notion of the Property Graph database.

Definition 10 (Property Graph Database)

A Property Graph database is a pair where is a PG schema and is a PG satisfying that .

2.4 RDF databases versus PG databases

Upon comparison of RDF graphs and PGs, we see that both share the main characteristics of a traditional labeled directed graph, that is, nodes and edges contain labels, the edges are directed, and multiple edges are possible between a given pair of nodes. However, there are also some differences between them:

  • An RDF graph contains nodes of type resource (whose label is an IRI) and nodes of type Literal (whose label is a value), whereas a PG allows a single type of node;

  • Each node or edge in an RDF graph contains just a single value (i.e. a label), whereas each node or edge in a PG could contain multiple labels and properties respectively;

  • An RDF graph supports multi-value properties, whereas a PG usually just support mono-value properties;

  • An RDF graph allows to have edges between edges, a feature which isn’t supported in a PG (by definition);

  • A node in an RDF graph could be associated with zero or more classes or resources, while a node in a PG usually has a single node type.

We consider factors such as the availability of schema information in the source model while developing database mappings. Depending on whether or not the input RDF data has schema, the database mappings can be classified into two types:

(i) schema-dependent: one that generates a target PG schema from the input RDF graph schema, and then transforms the RDF graph into a PG (see Section 3); and (ii) schema-independent: one that creates a generic PG schema (based on predefined structure) and then transforms the RDF graph into a PG (see Section 4). In this paper, we developed these two types of database mappings.

Our research omits two special features of RDF: Blank Nodes and reification (i.e. the description of RDF statements using a specific vocabulary). After an empirical study of different RDF datasets e.g. Bio2RDF [belleau2008], Europeana [haslhofer2011], LOD Cache [schmachtenberg2014], Wikidata [vrandevcic2012], Billion Triple Challenge (BTC) [herrera2019] we noticed that these features are rarely used in RDF. Table 1 resume our analysis of the use of Blank Nodes and reification in the above datasets.

Datasets Blank Nodes Reification
Europeana 0% 0%
Bio2RDF 0% 0%
LOD Cache 2.67% 1.3%
Wikidata 0.01% 0%
BTC 12.08% 0.02%
Table 1: Blank Nodes and reification in different datasets. Snapshot: 2019-10-08

3 Schema-dependent Database Mapping

In this section we present a database mapping from RDF databases to PG databases. Specifically, the database mapping is composed by the schema mapping and the instance mapping .

Recall that is the set of RDF datatypes and is the set of PG datatypes. Assume that there is a total function which maps RDF datatypes into PG datatypes. Additionally, assume that is the inverse function of , i.e. maps PG datatypes into RDF datatypes.

3.1 Schema mapping

We define a schema mapping which takes an RDF graph schema as input and returns a PG Schema as output.

Definition 11 (Schema Mapping )

Let be an RDF schema and be a Property Graph Schema. The schema mapping is defined as follows:

  1. For each satisfying that

    • There will be with

  2. For each satisfying that

    • If then

      • There will be with , where corresponds to .

    • If then

      • There will be with , where correspond to respectively.

Hence, the schema mapping creates a node type for each resource type (with exception of RDF data types); creates a property type for each object property; and creates an edge type for each value property.

For instance, the Property Graph Schema obtained from the graph schema shown in Figure 2 is given as follows:

1,
2,
3,
4, , ,
5, , , ,
6,
7, .

3.2 Instance Mapping

Now, we define the instance mapping which takes an RDF graph as input and returns a Property Graph as output.

Definition 12 (Instance Mapping )

Let –
be an RDF graph and be a Property Graph. The instance mapping is defined as follows:

  1. For each

    • There will be with .

    • There will be with

    • .

  2. For each satisfying that

    • There will be with , where corresponds to .

  3. For each satisfying that

    • There will be with , where correspond to respectively.

According to the above definition, the instance mapping creates a node in for each resource node, creates a property in for each datatype property, and creates an edge in for each object property.

For example, the PG obtained from the RDF graph is shown in Figure 1 is given as follows:

1,
2,
3,
4, ,
5,
6, , ,  , ,
7,
8, .

3.3 Properties of

In this section, the database mapping will be evaluated with respect to the properties described in Section 2.1.1. Specifically, we will analyze computability, semantics preservation, and information preservation.

Recall that is a formed by the schema mapping and the instance mapping .

Proposition 1

The database mapping is computable.

It is straightforward to see that Definition 11 and Definition 12 can be transformed into algorithms to compute and respectively.

Lemma 1

The database mapping is semantics preserving.

Note that the schema mapping and the instance mapping have been designed to create a Property Graph database that maintains the restrictions defined by the source RDF database. On one side, the schema mapping allows transforming the structural and semantic restrictions from the RDF graph schema to the PG schema. On the other side, any Property Graph generated by the instance mapping will be valid with respect to the generated PG schema.

The semantics preservation property of is supported by the following facts:

  • We provide a procedure to create a complete RDF graph schema from a set of RDF triples describing an RDF schema, i.e. each property defines its domain and range resource classes.

  • We provide a procedure to create an RDF graph from a set of RDF triples, satisfying that each every node and edge in is associated with a resource class; it allows a complete connection between the RDF instance and the RDF schema.

  • The schema mapping creates a node type for each user-defined resource type, a property type for each datatype property edge, and an edge for each object property type.

  • Similarly, the instance mapping creates a node for each resource, a property for each resource-literal edge, and an edge for each resource-resource edge.

Theorem 1

The database mapping is information preserving.

In order to prove that is information preserving, we need to define a database mapping which allows to transform a PG database into an RDF database, and satisfying that for any RDF database . Next we define the schema mapping and the instance mapping .

Definition 13 (Schema mapping )

Let be a Property Graph Schema and be an RDF schema. The schema mapping is defined as follows:

  1. Let

  2. For each , we create with

  3. For each with , we create with and satisfying that , and

  4. For each , we create with

    1. For each such , we create with and satisfying that , and

      1. There will be with

In general terms, the schema mapping creates a resource class for each node type, an object property for each edge type, and a datatype property for each property type. Given a PG schema , the schema mapping allows to “recover” all the schema constraints defined by , i.e .

An issue of , is the existence of RDF datatypes which are not supported by PG databases. For example, rdfs:Literal has no equivalent datatype in PG database systems. The solution to this issue is to find a one-to-one correspondence between RDF datatypes and PG datatypes.

Definition 14 (Instance mapping )

Let be a Property Graph and be an RDF graph. The instance mapping is defined as follows:

  1. For each , there will be where

    1. such that and

    2. For each where , there will be and with , , and

  2. For each where , there will be with and such that correspond to respectively.

Hence, the method defined above defines that each node in is transformed into a resource node in , each property in is transformed into a datatype property in , and each edge in is transformed into an object property in . Given a Property Graph , the instance mapping allows to “recover” all the data in , i.e