Property Graph Type System and Data Definition Language

10/20/2018 ∙ by Mingxi Wu, et al. ∙ TigerGraph 0

Property graph manages data by vertices and edges. Each vertex and edge can have a property map, storing ad hoc attribute and its value. Label can be attached to vertices and edges to group them. While this schema-less methodology is very flexible for data evolvement and for managing explosive graph element, it has two shortcomings-- 1) data dependency 2) less compression. Both problems can be solved by a schema based approach. In this paper, a type system used to model property graph is defined. Based on the type system, the associated data definition language (DDL) is proposed.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Property graph manages data by vertices and edges[3, 1, 2]. Each vertex and edge can have a property map, storing ad hoc attribute and its value. Label can be attached to vertices and edges to group them. While this schema-less methodology is very flexible for data evolvement and for managing explosive graph element, it has two shortcomings– 1) data dependency 2) less compression. Both problems can be solved by a schema based approach.

In this paper, a type system used to model property graph is defined. Based on the type system, the associated data definition language (DDL) is proposed.

2 Property Graph

A property graph is a directed graph consists of vertices and edges. Each vertex and edge can have property, value pairs. It is a multi-graph, meaning between two vertices, there can be multiple edges. Labels can be attached to vertices and edges, and they serve as tags.

The traditional property graph is schema-less with the benefits of great flexibility– vertex or edge instances can be labeled freely, and the properties of them can be expanded or shrunk independently of each other.

The drawback of the schema-less style is data dependency. Application written based on labelling and property map can only work when the developer knows the data. For example, if data ingestion developer is independent of the application developer, then the application developer has no way to know what’s ingested (label, properties) for each graph element, thus the data dependency problem. Another problem of schema-less is that it misses the opportunity of compressing more data. With a pre-defined schema, metadata and binary data are separated, and binary data can easily be compressed since each data record is structured.

We define a type system to make a property graph schema-based.

3 Property Graph Type System

In TigerGraph system, we define a property graph as a collection of typed vertices and the collection of typed edges that connect the vertices. By typed, we mean there is a user pre-defined schema for each vertex/edge type. Also, an edge type can be directed or undirected. Between two vertices, there can be multiple edges, and these edges can be of the same type or different types.

Note that readers should not be confused between the type and schema object concepts. They are different.

  • Type is a set of rules that can be named and assign to a programming construct, such as an object, a chunk of binary.

  • Schema object is created by DDL and resides in the catalog. Usually, the CREATE DDL create both a type and a schema object of the new type.

Each of the vertices and edges is of certain pre-defined type. This section describes the type system used to model a property graph. Namely, there are vertex type, edge type, graph type, and label type. Vertex types and edge types are the building block of a graph type. Label type is a semantic tag attachable to any graph elements.

3.1 Vertex Type

A vertex type is used to describe the schema of a class of vertex entities. It must have

  • a type name: an unique identifier.

  • a set of attributes (non-empty, at least one attribute serves as the primary key): each attribute has a name and its associated data type. The data type can be any ISO SQL data type, and container type (such as as MapK,V, ListT, SetT, and OrderT etc., where T is the element type).

  • a primary key: one or more attributes that can uniquely identify a vertex of this type.

  • a built-in label attribute, which encodes the labels (see 3.4).

Optionally, a vertex type can have an auto-assigned primary key

3.2 Edge Type

An edge type is used to describe the schema of a class of edge entities. It must have

  • a type name: an unique identifier.

  • a set of attributes (possibly empty): each attribute has a name and its associated data type. The data type can be any ISO SQL data type, and container type (such as as Map¡K,V¿, List¡T¿, Set¡T¿, and Order¡T¿ etc., where T is the element type).

  • one or more pairs of source vertex type and target vertex type.

  • direction property: directed or un-directed. If the edge type is directed, it means the edge type models an asymmetric relationship, where the edge instance always starts from a source vertex and ends at a target vertex to capture the one direction semantics. If the edge type is undirected, it models a symmetric relationship between the source and target vertex types. a built-in label attribute, which encodes the labels (see 3.4).

Optionally, an edge type may further have

  • a discriminator: a set of attributes that allow this distinction to be made when two edges share the same pair of source and target vertex instances . By default, the source vertex and target vertex’s primary keys are used to uniquely identify an edge instance. However, sometime the same edge type instances appear more than once between two vertices, a discriminator will help further distinguish them.

  • a reverse edge type: if the edge type is specified as directed, a reverse edge type shares the same schema as the forward edge type, except the direction of it is opposite to the forward edge; and reverse edge type has a unique name. This is useful when writing queries to articulate application logic, even when the engine can traverse a directed edge both ways.

3.3 Graph Type

A graph type specifies a collection of vertex types and edge types that collectively depicts the graph schema. It must have

  • a name: an unique identifier.

  • zero or more vertex types.

  • zero or more edge types: if an edge type is included in a graph type, its source and target vertex types must be included in the graph’s vertex type collection.

  • a built-in label attribute, which encodes the labels (see 3.4).

Note that when there is zero vertex types and edge types, it means an empty graph type with a user specified name. Sometimes, this is useful to reserve a graph namespace before user wants to add any next level types.

3.4 Label Type

Labels are semantic tag to graph data elements. It has

  • a name: an unique identifier

  • a description (optional): a string to describe its meaning, could be empty.

This is useful when user want to persist clustering or classification result without changing the vertex, edge or graph types.

3.5 Type Inheritance

For each of the above type, we allow type inheritance. Each type can have subtypes. Each subtype can only have one super type (except label type).

3.5.1 Vertex Type Inheritance

A vertex type can be derived from another vertex type. For example, a professor is a sub vertex type, extended from the person super vertex type. A sub vertex type can only inherit one super vertex type. The sub vertex type will

  • Inherit all of the super vertex type attributes

  • Share the same primary key of the super type. This is important to support polymorphism. In graph database, a unique identifier is important to identify an object of a particular type. To support using super type to do polymorphism, a shared primary key between the super type and its descendant types is a must.

  • When the super type attributes are changed in the super type, all descendant types’ attributes are changed accordingly.

  • The super type attributes cannot be changed directly in its descendant subtype.

3.5.2 Edge Type Inheritance

An edge type can also have sub types. The sub type will

  • Inherit all the super edge type attributes.

  • Inherit the pairs of source vertex type and target vertex type.

  • Inherit the direction property.

  • Inherit the discriminator if there is one.

  • Create the reverse edge type with a different name.

  • When the super type attributes are changed in the super type, all descendant types’ attributes are changed accordingly.

  • When the super type’s source and target vertex type pairs are changed in the super type, all descendant types’ are changed accordingly.

  • The super type attributes and (source, target) vertex pairs cannot be changed directly in the subtype.

3.5.3 Graph Type Inheritance

A graph can also have sub types. The subtype will

  • Inherit all the vertex types and edge types in the super graph type.

  • When any types contained in the super graph type change, the subgraph type changes accordingly.

  • The super graph’s vertex and edge types cannot be changed directly in the sub graph type.

3.5.4 Label Type Inheritance

A label can also have sub types. The subtype will form an inheritance relationship with the super type. A label sub type can inherit multiple super types.

4 DDL for the Property Graph Model

With the above graph type system, we are ready to discuss the DDL to create schema objects. In this section, we only list the syntax by example. The actual EBNF rule can be found in the Appendix section.

DDL reserved keyword is case insensitive. User specified identifier is case sensitive. However, for ease of presentation, we use upper case on keyword, and lower case on user specified identifier. Here we focus on illustrating the syntax.

4.1 Create Statement

A CREATE statement will create a schema object, and a schema type. The schema object will be stored in the database’s catalog; its corresponding type is also created implicitly.

  • create vertex

    #a person schema object is created; its type is person vertex type.
    CREATE VERTEX person (name STRING NOT NULL PRIMARY KEY, age INT,
    gender STRING, state STRING)
    
    #another way to specify primary key
    CREATE VERTEX person (first_name STRING NOT NULL,
    last_name STRING NOT NULL, age INT, gender STRING, state STRING,
    PRIMARY KEY(first_name, last_name))
    
    #a professor schema object is created; its type is professor vertex type
    #which is a subtype of person vertex type.
    CREATE VERTEX professor EXTENDS person (position STRING)
    
  • create edge

    #an friendship schema object is created; its type is friendship edge type.
    CREATE UNDIRECTED EDGE friendship (FROM person, TO person,
    connect_day DATETIME)
    
    #supervise and supervised_by are both created; their type is
    #supervise and supervised_by edge type, respectively.
    CREATE DIRECTED EDGE supervise (FROM person, TO person,
    connect_day DATETIME) WITH REVERSE_EDGE=supervised_by
    
    CREATE DIRECTED EDGE supervise (FROM person, TO person,
    connect_day DATETIME, DISCRIMINATOR (connect_day))
     WITH REVERSE_EDGE=supervised_by
    
    #a mentorship schema object is created; its type is
    mentorship edge type, which is a subtype of supervise
    # edge type.
    CREATE DIRECTED EDGE mentorship EXTENDS supervise(end_day DATETIME)
    WITH REVERSE_EDGE= mentored_by
    
  • create graph

    #below two graph schema objects are created. Both contain
    #the same schema object of person vertex type.
    CREATE GRAPH social (person, friendship)
    CREATE GRAPH company (person, supervise)
    
    #below, a graph facebook is created based on social,
    #it has alumni edge type added.
    CREATE UNDIRECTED EDGE alumni_relation (FROM person, TO person)
    CREATE GRAPH facebook EXTENDS social (alumni_relation)
    
  • create label

    CREATE LABEL color Description "color super class
    CREATE LABEL car Description "car super class
    CREATE LABEL redcar EXTENDS color, car
    

4.2 Drop Statement

A DROP statement will remove a schema object, and its corresponding type from the catalog.

For inheritance constrain related dropping:

  • When dropping a supertype A, an error is raised if there exists subtypes B of A.

  • When dropping a subtype B, all data of supertype A stays.

The common drop statements.

  • drop vertex

    #Drop vertex type  can use CASCADE option.
    #When dropping a vertex type V, without CASCADE keyword,
    #an error will be raised if there is an edge type  E
    #referencing this vertex type V. With CASCADE keyword,
    #E will be modified to reflect the disappearance of V.
    #When the source types  (or the target types) of E mention only
    #V, then E is dropped.
    DROP VERTEX person CASCADE
    DROP VERTEX person, city, school
    
    #delete all vertex schema objects and their types
    DROP VERTEX *
    
  • drop edge

    #drop the edge type and object
    DROP EDGE friendship, supervise
    #delete all edge types
    DROP EDGE *
    
  • drop graph

    DROP GRAPH social, company
    
  • drop label

    DROP LABEL red
    DROP LABEL color
    

4.3 Alter Statement

An ALTER statement will change a schema object and its type.

  • alter vertex add/drop attributes

    ALTER VERTEX person ADD (ssn VARCHAR(9))
    ALTER VERTEX person DROP (ssn VARCHAR(9))
    
  • alter edge add/drop attributes

    ALTER EDGE friendship ADD (location VARCHAR(20))
    ALTER EDGE friendship DROP (location VARCHAR(20))
    
  • alter graph add/drop vertex type and edge type

    ALTER GRAPH school ADD VERTEX (professor, student)
    ALTER GRAPH school DROP VERTEX (professor)
    
    #note below, it implicitly add all dependent source vertex
    #type and target vertex type into the graph
    ALTER GRAPH school ADD EDGE (teach_class)
    
    #only drop edge type
    ALTER GRAPH school DROP EDGE (teach_class)
    
    #note below, it will drop all dependent edges from the graph.
    ALTER GRAPH school DROP VERTEX (professor) CASCADE
    

References

  • [1] Renzo Angles. The property graph database model. Proceedings of the 12th Alberto Mendelzon International Workshop on Foundations of Data Management, 2018.
  • [2] Pablo Barceló Peter A. Boncz George H. L. Fletcher Claudio Gutierrez Tobias Lindaaker Marcus Paradies Stefan Plantikow Juan F. Sequeda Oskar van Rest Hannes Voigt Renzo Angles, Marcelo Arenas. G-core: A core for future graph query languages. Proceedings of the SIGMOD, 2018.
  • [3] Neubauer P. Rodriguez, M.A. Constructions from dots and lines. Bul. Am. Soc. Info. Sci. Tech., 36(6):35–41, 2010.

5 Appendix

CREATE_QB ::= create (CREATE_VERTEX_QB | CREATE_EDGE_QB |
                        CREATE_GRAPH_QB | CREATE_LABEL_QB)

CREATE_VERTEX_QB ::= ’vertex’ IDENTIFIER ’(’ ATTRIBUTE_ELEMENT (’,’
                    ATTRIBUTE_ELEMENT)* (, primary key
                    (  IDENTIFIER (,IDENTIFIER)*))?  ’)’
CREATE_EDGE_QB ::=   (’directed |’undirected’)  ’edge’ IDENTIFIER
 ’(’ ’from’ ((IDENTIFIER (’|’ IDENTIFIER)*)|STAR) ’,’ ’to’
 ((IDENTIFIER (’|’ IDENTIFIER)*)|STAR) (’,’ ATTRIBUTE_ELEMENT)* ’)’
(’WITH’  KEY_VAL_SUFFIX)?

CREATE_GRAPH_QB ::= ’graph’ IDENTIFIER ’(’ IDENTIFIER (’,’ IDENTIFIER)* ’)’

CREATE_LABEL_QB ::= ’label’ IDENTIFIER (DESCRIPTION STRING_LITERAL)?

DROP_QB ::= drop DROP_VERTEX_QB | DROP_EDGE_QB | DROP_GRAPH_QB |
DROP_LABEL_QB

DROP_VERTEX_QB ::= vertex IDENTIFIER (, IDENTIFIER)* cascade?

DROP_EDGE_QB ::=  edge IDENTIFIER (, IDENTIFIER)*

DROP_GRAPH_QB ::= graph IDENTIFIER (, IDENTIFIER)*

DROP_LABEL_QB ::= label IDENTIFIER (, IDENTIFIER)*

ALTER_QB ::= alter (ALTER_VERTEX_QB | ALTER_EDGE_QB|ALTER_GRAPH_QB)

ALTER_VERTEX_QB ::= ’vertex’ IDENTIFIER (ALTER_ADD_VERTEX_PROP_QB |
ALTER_UPDATE_VERTEX_PROP_QB | ALTER_DROP_VERTEX_PROP_QB)

ALTER_EDGE_QB ::= ’edge’ IDENTIFIER (ALTER_ADD_EDGE_PROP_QB |
ALTER_UPDATE_EDGE_PROP_QB | ALTER_DROP_EDGE_PROP_QB)