Online Analytical Processsing on Graph Data

09/03/2019 ∙ by Leticia Gómez, et al. ∙ 0

Online Analytical Processing (OLAP) comprises tools and algorithms that allow querying multidimensional databases. It is based on the multidimensional model, where data can be seen as a cube such that each cell contains one or more measures that can be aggregated along dimensions. In a Big Data scenario, traditional data warehousing and OLAP operations are clearly not sufficient to address current data analysis requirements, for example, social network analysis. Furthermore, OLAP operations and models can expand the possibilities of graph analysis beyond the traditional graph-based computation. Nevertheless, there is not much work on the problem of taking OLAP analysis to the graph data model. This paper proposes a formal multidimensional model for graph analysis, that considers the basic graph data, and also background information in the form of dimension hierarchies. The graphs in this model are node- and edge-labelled directed multi-hypergraphs, called graphoids, which can be defined at several different levels of granularity using the dimensions associated with them. Operations analogous to the ones used in typical OLAP over cubes are defined over graphoids. The paper presents a formal definition of the graphoid model for OLAP, proves that the typical OLAP operations on cubes can be expressed over the graphoid model, and shows that the classic data cube model is a particular case of the graphoid data model. Finally, a case study supports the claim that, for many kinds of OLAP-like analysis on graphs, the graphoid model works better than the typical relational OLAP alternative, and for the classic OLAP queries, it remains competitive.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Online Analytical Processing(OLAP) [9, 15] comprises tools and algorithms that allow querying multidimensional (MD) databases. In these databases, data are modelled as data cubes, where each cell contains one or more measures of interest, that quantify facts. Measure values can be aggregated along dimensions, organized as sets of hierarchies. Traditional OLAP operations are used to manipulate the data cube, for example: aggregation and disaggregation of measure data along the dimensions; selection of a portion of the cube; or projection of the data cube over a subset of its dimensions. The cube is computed after a process called ETL, an acronym for Extract, Transform, and Load, which requires a complex and expensive load of work to carry data from the sources to the MD database, typically a data warehouse (DW). Although OLAP has been used for social network analysis [10, 12], in a “Big Data” scenario, further requirements appear [5]. In the classic paper by Cohen et al. [4], the so-called MAD skills (standing from Magnetic, Agile and Deep) required for data analytics are described. In this scenario, more complex analysis tools are required, that go beyond classic OLAP [14]. Graphs, and, particularly, property graphs [8, 13], are becoming increasingly popular to model different kinds of networks (for instance, social networks, sensor networks, and the kind). Property graphs underlie the most popular graph databases [1]. Examples of graph databases and graph processing frameworks following this model are Neo4j222http://www.neo4j.com, Janusgraph333http://janusgraph.org/ (previously called Titan), and GraphFrames444https://graphframes.github.io/. In addition to traditional graph analytics, it is also interesting for the data scientist to have the possibility of performing OLAP on graphs.

From the discussion above, it follows that, on the one hand, traditional data warehousing and OLAP operations on cubes are clearly not sufficient to address the current data analysis requirements; on the other hand, OLAP operations and models can expand the possibilities of graph analysis beyond the traditional graph-based computation, like shortest-path, centrality analysis and so on. In spite of the above, not many proposals have been presented in this sense so far. In addition, most of the existing work addresses homogeneous graphs (that is, graphs where all nodes are of the same type), where the measure of interest is related to the OLAP analysis on the graph topology [3, 17, 18]. Further, existing works only address graphs with binary relationships (see Section 2 for an in-depth discussion on these issues). However, real-world graphs are complex and often heterogeneous, where nodes and edges can be of different types, and relating different numbers of entities.

This paper proposes a MD data model for graph analysis, that considers not only the basic graph data, but background information in the form of dimension hierarchies as well. The graphs in this model are node- and edge-labelled directed multi-hypergraphs, called graphoids. In essence, these can be denoted “property hypergraphs”. A graphoid can be defined at several different levels of granularity, using the dimensions associated with them. For this, the Climb operation is available. Over this model, operations like the ones used in typical OLAP on cubes are defined, namely Roll-Up, Drill-Down, Slice, and Dice, as well as other operations for graphoid manipulation, e.g., n-delete (which deletes nodes). The hypergraph model allows a natural representation of facts with different dimensions, since hyperedges can connect a variable number of nodes of different types. A typical example is the analysis of phone calls, the running example that will be used throughout this paper. Here, not only point-to-point calls between two partners can be represented, but also “group calls” between any number of participants. In classic OLAP [9], a group call must be represented by means of a fact table containing a fixed number of columns (e.g., caller, callee, and the corresponding measures). Therefore, when the OLAP analysis for telecommunication information concerns point-to-point calls between two partners, the relational representation (denoted ROLAP) works fine, but when this is not the case, modelling and querying issues appear, which calls for a more natural representation, closer to the original data format. And here is where the hypergraph model comes to the rescue [6]. In summary, the main contributions of the paper are:

  1. A graph data model based on the notion of graphoids;

  2. The definition of a collection of OLAP operations over these graphoids;

  3. A proof that the classical OLAP operations on cubes can be simulated by the OLAP operations defined in the graphoid model and, therefore, that these graphoid-based operations are at least as powerful as the classical OLAP operations on cubes;

  4. A case study and a series of experiments, that give the intuition of a class of problems where the graphoid model works clearly better than relational OLAP, whereas for classic OLAP queries, the graph representation is still competitive with the relational alternative.

In addition to the above, of course all the classic analysis tools from graph theory are supported by the model, although this topic is beyond the scope of this paper.

Remark 1

This paper does not claim that the graphoid model is always more appropriate than the classic relational OLAP representation. Instead, the proposal aims at showing that when a more flexible model is needed, where n-ary relationships between instances are present (and n is variable), the model allows not only for a more natural representation, but also can deliver better performance for some critical queries.    

The remainder of this paper is organized as follows: Section 2 discusses related work. Section 3 presents the graphoid data model. Section 4 presents the OLAP operations on graphoids, while Section 5 shows that the graphoid OLAP operations capture the classic OLAP operations on cubes. Section 6 discusses a case study and presents an experimental analysis. Section 7 concludes the paper.

2 Related Work

The model described in the next sections is based on the notion of property graphs [2]. In this model, nodes and edges (hyperdeges, as will be explained later) are labelled with a sequence of attribute-value pairs. It will be assumed that the values of the attributes represent members of dimension levels (i.e., each attribute value is an element in the domain of a dimension level), and thus nodes and edges can be aggregated, provided that an attribute hierarchy is defined over those dimensions. Property graphs are the usual choice in modern graph database models used in practical implementations. Attributes are included in nodes and edges mainly aimed at improving the speed of retrieval of the data directly related to a given node. Here, these attributes are also used to perform OLAP operations.

A key difference between existing works, and the proposal introduced in this paper, is that the latter supports the notion of OLAP hypergraphs

, highly expanding the possibilities of analysis. This way, instead of binary relationships between nodes, there are n-ary, probably duplicated relationships, which are typical in Data Warehousing and OLAP. Further, supporting n-ary relationships allows naturally modelling OLAP situations where different facts have a different number of relations, like in the group calls case commented in Section 

1, and studied in Section 6. In other words, the model handles multi-hypergraphs. Also, the paper works over the classic OLAP operations, and formally defines their meaning in a graph context. This approach allows an OLAP user to work with the notion of a data cube at the conceptual level [15], regardless the kind of underlying data (in this case, graphs), defining OLAP operations in terms of cubes and dimensions rather than in terms of nodes and edges. Finally, the authors have shown the usefulness of this proposal in different scenarios, like trajectory analysis [7] and typical OLAP analysis on social networks [16].

3 Data Model

This section presents the graphoid OLAP data model. First, background dimensions are formally defined, along the lines of the classic OLAP literature. Then, the (hyper)graph data model is introduced.

3.1 Hierarchies and Dimensions

The notions of dimension schema and dimension graph (or dimension instance) that will be used throughout the paper, are introduced first. These concepts are needed to make the paper self-contained, and to understand the examples. The reader is referred to [11] for full details of the underlying OLAP data model.

Definition 1 (Dimension Schema, Hierarchy and Level)

Let be a name for a dimension. A dimension schema for is a lattice (a partial order), with a unique top-node, called (which has only incoming edges) and a unique bottom-node, called (which has only outgoing edges), such that all maximal-length paths in the graph go from to . Any path from to in a dimension schema is called a hierarchy of . Each node in a hierarchy (that is, in a dimension schema) is called a level of .    

The running example used throughout this paper analyses calls between customers, which belong to different companies. For this, as background (contextual) information for the graph data representing calls (to be explained later), there is a Phone dimension, with levels Phone (representing the phone number), Customer, City, Country, and Operator. There is also a Time dimension, with levels Date, Month, and Year. The following examples explain this in detail.

Example 1

Figure 1 depicts the dimension schemas and for the dimensions Phone and Time, respectively. In addition, there is also a dimension denoted Id, representing identifiers, that will be explained later. In the dimension Phone, it holds that , and there are two hierarchies denoted, respectively, as

and

The node Customer is an example of a level in the first of the above hierarchies. For the dimension Time, holds, as well as the hierarchy .    

Figure 1: Dimension schemas for the dimensions Time (), Phone (), and Id (identifier) ().
Definition 2 (Level, Hierarchy, and Dimension Instances)

Let be a dimension with schema , and let be a level in . A level instance of is a non-empty, finite set . If , then is the singleton . If , then is the domain of the dimension , that is, .

A dimension graph (or instance) over the dimension schema is a directed acyclic graph with node set

where the union is taken over all levels in . The edge set of this directed acyclic graph is defined as follows. Let and be two levels of , and let and . Then, only if there is a directed edge from to in , there can be a directed edge in from to .

If is a hierarchy in , then the hierarchy instance (relative to the dimension instance ) is the subgraph of with nodes from , for appearing in . This subgraph is denoted .    

Remark 2

A hierarchy instance is always a (directed) tree, since a hierarchy is a linear lattice. The following terminology is used. If and are two nodes in a hierarchy instance , such that is in the transitive closure of the edge relation of , then it is said that rolls-up to , and denoted by (or if is clear from the context). Example 2 illustrates these concepts.    

Example 2

Consider dimension Phone whose schema is given in Figure 1 (). Associated with this schema, there is an instance where . Also, at the Operator level, . This dimension instance is depicted in Figure 2, which shows, e.g., that phone lines and correspond to the operator .    

Figure 2: An example of a dimension instance for the dimension .

In what follows, “sound” dimension graphs are assumed. In thses graphs, rolling-up from the level, to the same element along different paths, gives the same result [11], typical in so-called balanced (or homogeneous) dimensions [15].

3.2 The Base Graph and Graphoids

As a basic data structure for modelling OLAP on graph data, the concept of graphoid is introduced and defined in this section. A graphoid plays the role of a multi-dimensional cuboid in classical OLAP and it is designed to represent the information of the application domain, at a certain level of granularity. Essentially, a graphoid is a node- and edge-labelled directed multi-hypergraph.

In what follows, a collection of dimensions is assumed in the application domain, and their schemas are given. Furthermore, hierarchy instances for all dimensions are given. Finally, assume that a special dimension is given, to represent unique identifiers (Figure 1(c)). The notions of attributes, node types and edge types are defined next.

Attributes

The set of attributes that describe the data is defined as As described in Section 3.1, to each attribute of , a domain is associated, from which the attribute takes values.

Node types

Assume a finite, non-empty set of node types. Elements of are denoted by a string starting with a hashtag. For example, the node type indicates that a node in a graph represents a phone line number. There are also two functions, and defined on . For each node type in , is a natural number, called the arity, that expresses the number of attributes associated with a node of type . Also, is an -tuple of attributes, which are dimensions defined at the level, the first of which is the Identifier dimension. This means that is an element of . The tuple tells which attributes are associated with a node of type , without specifying their levels. Finally, assume that contains no repetition, which is the usual case in practice. The identifier dimension is always used at its Bottom level.

Edge types

Assume the existence of a finite, non-empty set of edge types, which is disjoint from the set . Elements of will also be denoted by a string starting with a hashtag. For example, the node type indicates that an edge connects nodes that participate in a call. Again, also assume the existence of the functions and on . To each edge type in , is a natural number, called the arity, that expresses the number of attributes associated with an edge of type . Also, is an -tuple of attributes, which are dimensions (at Bottom-level). This means that is an element of . The tuple expresses which attributes are associated with an edge of type , without specifying their levels. Finally, assume that contains no repetition. The identifier dimension (at its Bottom level) may appear, but is not required. If the identifier dimension appears, this only occurs once, among the attributes that describe edges of a certain type.

It is now possible to define the notion of graphoid.

Definition 3 (Graphoid)

Let be the identifier dimension. Let dimensions be given with their respective schemas and instances. Let be levels for these respective dimensions. A -graphoid (or graphoid, for short, if the levels are clear from the context) is a 6-tuple , where

  • is a finite, non-empty set, called the set of nodes of ;

  • is a function from to (that associates a unique type with each node of );

  • is a function that maps a node to a string , where and, if , then, for , , if is the dimension . It is assumed that different -values are associated with different nodes, since the first attribute value acts as a node identifier; is denoted the node labelling function;

  • is a subbag555Let and be bags (or sets). If the number of occurrences of each element in is less than or equal to the number of occurrences of in , then is called a subbag of , also denoted . of the set , which we call the set of (multi hyper-)edges of ;

  • is a function from to (that associates a unique type to each edge of ); and

  • is a function that maps a hyperedge to a string , where and, if , then, for , , if is the dimension ; is called the edge labelling function.    

The basic graph data that serves as input data to the graph OLAP process, is called the base graph. A base graph plays the role of a multi-dimensional cube in classical OLAP and is designed to contain all the information of the application domain, at the lowest level of granularity.

Definition 4 (Base graph)

Let dimensions be given with their respective schemas and instances. The -graphoid is called the base graph.    

Example 3

The running example used in this paper is aimed at analysing calls between customers of phone lines; lines correspond to different operators. Examples 1 and 2 showed some of the dimensions used as background information. Next, the call information is shown, represented as a graph. The Phone dimension plays the roles of the calling line and the callee lines (this is called a role-playing dimension in the OLAP literature [15]). The information in the hyperedges reflects the total duration of the calls between two or more phone numbers on a given day. Figure 3 shows an example of a base graph, where is the node set. The nodes in this base graph are all of the same type and represent phones (not persons–a person may have more than one phone). In this example, . The node type has arity . Its first attribute is a node identifier and the second one is a dimensional attribute that represents the phone number, with domain . In the example of Figure 3,

Hyperedges represent phone calls, which most of the time involve two phones, but which may also involve multiple phones, representing so-called “group calls.” So, edges are all of the same type and . In Figure 3, a directed hyperedge from a subset of to a subset of is graphically represented by a coloured node which has incoming arrows (of the same colour) from all elements of and outgoing arrows (again of the same colour) to all elements of . Such a coloured construction is a depiction of the hyperedge , which will be denoted from now on.666The nodes of are called the source nodes of and the nodes of are called the target nodes of . The source and target nodes of are called adjacent to , and the set of the adjacent nodes to is denoted by . Thus, . For example, the red and purple hyperedges represent two different phone calls from to , made on the same day and of the same duration. This example explains why the model assumes bags rather than sets. The orange hyperedge represents a group call, from to both and There are six phone calls shown in the figure. So, is the bag The edge labelling function associates two attributes, with edges of type , namely Date and Duration. Date is a dimensional attribute to which the dimensional hierarchy in Figure 1 is associated. Duration is a measure attribute (which has as an associated aggregation function, in this case, the summation).

Figure 3: Basic phone call data as a base graph.

Note that, although the base graph plays the role of a multi-dimensional cube in classical OLAP (or a fact table in relational OLAP), a key difference is that this cube has a variable number of “axes”, since it can represent facts including a variable number of dimensions. The next example discusses two graphoids whose dimensions are at different levels of granularity. Later it will be explained how these graphoids can be obtained from the base one.

Example 4

Continuing with Example 3, consider two available dimensions, namely and . A -graphoid based on the base graph of Figure 3, is shown in Figure 4. Here, in the Phone nodes, the phone numbers have been replaced with their corresponding operator name, at the Phone.Operator level in the dimension Phone (e.g., for , the corresponding operator is Movistar).

Figure 4: A -graphoid, based on the data shown in Figure 3.

Figure 5 shows an alternative -graphoid for the data from Figure 3. This graphoid has as a node set. The nodes with identifiers 12 and 14 represent, respectively, and in the base graph (and also in the graphoid of Figure 4), which belong to the operator Vodafone. Thus, these two nodes were collapsed into one (with identifier 12) and similarly, the nodes and were collapsed into one node (with identifier 13). These operations were possible because these nodes have identical attribute values (apart from the identifier). For the dimension Time, all information in Figure 5 is at the level of Day and all information for the dimension Phone is at the level of Company. These examples show that there can be more than one -graphoids “consistent” with the given base graph. Thus, some kind of normalization is needed. This is studied in the next section.    

Figure 5: An alternative -graphoid, based on the data shown in Figure 3.
Remark 3

Nodes are assumed to represent basic objects in the modelled application world. These objects are given by a number of descriptive attributes. Measure information, typically present in an OLAP setting to quantify facts, is, in this philosophy, represented as attributes on the hyperedges. The call duration is an example of a measure that is placed on edges of the type Call. However, the above definition also allows for node attributes to be dimensions that contain measure information. Consider a slightly modified situation in which an object of type includes an additional attribute that expresses the average (or expected) billing amount for that particular phone number, for example, . In this modified setting, a user may want to compute the average expected billing amount over all phone lines. To answer these kinds of queries, attribute values of certain types of nodes must be averaged (in the example, the attribute). However, in the model presented here, aggregations are only performed on attribute values of hyperedges. Whenever this problem occurs, the representation can be modified as illustrated in Figure 6. On the left-hand side, there is a node that includes the attribute. On the right-hand side, this attribute is brought to the level in its dimension and gets the value . The expected billing information is moved to a new edge of type , where it can be subject to aggregation. The above operation is called the edgification of an attribute in a node of type , and it is denoted by .    

Figure 6: (a) A node with label , where 880 expresses the expected bill. (b) An edgification of this node, where the expected billing information is moved to an edge that is labelled .

3.3 Minimal graphoids

In this section, the notion of minimal -graphoid is defined. This graphoid is obtained collapsing the nodes that have identical labels (apart from the identifier) in the original graphoid. Let be a -graphoid. If the nodes have identical labels, apart from the identifier, denoted , then these nodes are identified, such that only the one with the smallest identifier is preserved, while the others are deleted. So, if the -values of the nodes pairwise satisfy the -relationship, and has the smallest identifier among them, then the nodes are replaced by and then deleted. The expression , for , indicates that represents the nodes in the minimal graph. All edges leaving from or arriving at the nodes are redirected to . For this purpose, the function is defined on subsets of the node set : if , then . Now, the notion of minimal graphoid is defined more formally.

Definition 5 (Minimal graphoid)

Let and be the same as in Definition 3. Let be a -graphoid. The minimal graphoid of is the -graphoid , defined as follows:

  • is the set ;

  • is a function from to , defined as , for each in ;

  • is a function on defined as , for each in ;

  • is a subbag of the set , defined as follows: for each hyperedge in , then a new hyperedge is in ;

  • is a function from to , defined as , for each in ;

  • is a function on and it is defined as , for each in .

Remark 4

The set of nodes of is contracted to the set , therefore each node in has the smallest identifier among all nodes that are mapped to by the -function. For edges, is defined as the bag , which means that for each hyperedge in , there is a corresponding hyperedge in . This means that the cardinalities of the bags and are the same.    

Proposition 1 immediately follows from Definition 5.

Proposition 1

For any -graphoid , its minimal -graphoid always exists and it is unique.    

Example 5

The two -graphoids shown in Figures 4 and 5 in Example 4, correspond to the graph of Figure 3. The graphoid of Figure 5 is the minimal graphoid of Figure 4. In this example, the original nodes and are contracted into one node, namely the node (since it has the smallest identifier of the two). Similarly, the original nodes and are contracted into the node . The original node remains unchanged. Between nodes and , there are two edges (with the same label) in the original graph. They are copied in the minimal graph. The edges between nodes and , and and , respectively, become two edges between the nodes and in the minimal graph. The two hyperedges that involve nodes , and correspond to two hyperedges between the nodes and in the minimal graph.    

For any -graphoid , the result of the minimisation described in this section is denoted , and called the minimisation of .

Remark 5

It is easy to see that the minimal -graphoid of a -graphoid can be computed, in the worst case, in time that is quadratic in and linear in . This can be improved, for instance, with an early pruning of the nodes that will not be contracted. Addressing this issue is beyond the scope of this paper.    

4 OLAP Operations on Graphs

In this section, the operations that compose the graph-OLAP language over graphoids are defined. Section 5 will show that these operations can simulate the typical OLAP operations on cubes.

4.1 Climb

The Climb-operation, intuitively, allows to define graphs at different levels of granularity, based on the background dimensions.

Definition 6 (Climb)

Assume a -graphoid is given as follows: . Let be a dimension that appears in , and and be levels in the schema of this dimension, such that . Also, let be the corresponding rollup function (at the instance level). Finally, let be a node type that appears in , and be an edge type that appears in .

The node-climb-operation of along the dimension from level to level in all nodes of type , denoted , replaces all attribute values from by the value from , in all nodes of of type , leaving unaltered otherwise.

The edge-climb-operation of along the dimension from level to level in all hyperedges of type , denoted , replaces all attribute values from by the value from , in all edges of of type , leaving unaltered otherwise.    

Example 6

Applying to the graphoid depicted in Figure 3 the operation , results in the graphoid shown inFigure 4.    

Remark 6

If a dimension appears in multiple node types and edge types, to apply the Climb-operation on many of them, the shorthand expression can be used. Finally, denotes a climbing, in the dimension , from level to level in all possible node and edge types.    

4.2 Grouping

The Group-operation, both on nodes and on edges, is defined in this section.

Definition 7 (Grouping)

Assume a -graphoid is given as follows: . Let be a dimension that appears in and let and be levels in the schema of this dimension, such that . Let be the corresponding rollup function. Let be a node type that appears in and let be an edge type that appears in .

The node-grouping of along the dimension from level to level in all nodes of type , denoted , is defined as .

The edge-grouping of along the dimension from level to level in all hyperedges of type , denoted , is defined as .    

Example 7

Applying to the graphoid depicted in Figure 4 the operation , results in the graphoid, depicted in Figure 5.    

4.3 Aggregate

In this section, the Aggr-operation on measures stored in edges is defined.

Definition 8 (Aggregate)

Given a minimal -graphoid defined as , let be a dimension that appears in the hyperedges of of type that plays the role of a measure, to which the aggregate function can be applied. The aggregation of the graphoid over the dimension (using the function ), denoted , results in a graphoid over the same and as , with the following modified hyperedge bag . If the hyperedges are all of type and all of type (and if they are the only ones), and if agrees on all of them apart from a possible identifier-attribute, and apart from the dimension , then the hyperedges are replaced by one of them (say ) of the same type and with the same attribute values, apart from the identifier, which is the identifier of , and the value of the attribute , which becomes the value of the aggregation function applied to the values of the attribute of the edges .    

Example 8

Applying the operation to the graphoid , depicted in Figure 5, results in a graphoid where the two edges that connect the nodes and are replaced by one edge with label , which contains, in the measure attribute, the sum of the two durations.    

Remark 7

To aggregate multiple dimensions , using the aggregate functions simultaneously, the notation would be: . Also, for simplicity, only the typical SQL aggregate functions Sum, Max,Min and Count are considered.    

Remark 8

Although the operations Climb, Group, and Aggr, are not present in classic relational OLAP, they are included here for several reasons: first, they can be useful when operating on graphs in practice; second, they facilitate and make it simple the definition of the Roll-up operation, that otherwise could be unnecessarily difficult to express.    

4.4 Roll-Up

The operations defined above allow defining the Roll-Up-operation over dimensions and measures stored in edges, as explained next.

Definition 9 (Roll-Up)

Assume a -graphoid is given as follows: . Let be a dimension that appears in some nodes and/or hyperedges of , that plays the role of a climbing dimension. Let be dimensions that appear in the hyperedges of type of . These dimensions play the role of measure dimensions, and it is assumed that aggregate functions are associated with them. Let be node types appearing in , and let be hyperedge types appearing in . The roll-up of over the dimensions (using the functions ) in hyperedges of type , and over the climbing dimension from level to level in nodes of types and edges of types , denoted

is defined as

Example 9

Applying to the graphoid depicted in Figure 5 the operation , results in the graphoid of Figure 7. The minimisation step in the above implementation of the roll-up operation does nothing, in this case, since the operation is applied to a minimal graphoid.    

Figure 7: The result of the operation applied to the graphoid of Figure 5.
Remark 9

To apply the climbing in the roll-up operation to the nodes and edges of all possible types, the shorthand “” is used as follows: . To aggregate over all edge types, the notation is    

4.5 Drill-Down

The Drill-Down-operation does the opposite of Roll-Up,777Actually, this is true for a sequence of roll-up and drill-down operations such that there are no slicing or dicing operations (explained in Sections  4.6 and  4.7) in-between. However, for the sake of simplicity, and without loss of generality, in this paper it is assumed that roll-up and drill-down are the inverse of each other. taking a graphoid to a finer granularity level, along a dimension , call it a descending dimension, and also operating over a collection of measures, using the same aggregate functions associated with such measures. Note also that, descending from a level down to a level along a dimension is equivalent to climbing from the bottom level of , , to the level along . Thus, the drill-down of over the dimensions (using the functions ) in hyperedges of type , and over the descending dimension from level to level in nodes of types and edges of types , denoted

is defined as

Given the above, in what follows the discussion is limited to the Roll-Up-operation.

4.6 Dice

The Dice-operation over a graphoid, produces a subgraphoid that satisfies a Boolean condition over the available dimension levels. A “strong” version is also defined, called the s-Dice-operation. In this context, is a Boolean combination of atomic conditions of the form , , and , where is a dimension, is a level in that dimension, and . The expression can be written in disjunctive normal form as

where all are atomic conditions.

Before giving the definition of the Dice-operation, it must be explained what does it mean that a hyperedge in a graphoid satisfies , denoted . For this, interpreting conjunction and disjunction in the usual way, it suffices to define for the atomic formulas that appear in . Thus, cannot be evaluated in if the label of does not contain information on dimension at level . Otherwise, can be evaluated in Let be , or ; is not false in if it can be evaluated in and is true, or if it cannot be evaluated in . The notion of being not false in a node adjacent to (that is, ) is defined analogously. Finally, if is not false in and not false in all .

Definition 10 (Dice)

Assume a -graphoid is given as . Let be a Boolean combination of equality and inequality constraints that involve, on the one hand dimension levels (equal or higher than in the dimension schemas , respectively), and on the other hand, constants from . The dice over on the condition , denoted produces a subgraphoid of , whose nodes are the nodes of and whose edges satisfy the conditions expressed by . When an hyperedge does not satisfy , the whole hyperedge is deleted from the graph and thus, it does not belong to All other edges of belong to If two edges in have the same set of adjacent nodes and one of them is deleted from in then both of them are deleted in to obtain the strong dice over on the condition , denoted    

4.7 Slice

Intuitively, the Slice operation eliminates the references to a dimension in a graphoid. The formal definition follows.

Definition 11 (Slice)

Assume a -graphoid is given as . Let be a dimension that appears in some nodes and/or hyperedges of . Let be dimensions that appear in the hyperedges of . These dimensions play the role of measure dimensions. It is assumed that aggregate functions are associated with them. The slice of the dimension from over the dimensions (using the functions ), denoted is defined as the roll-up operation up to the level over the dimensions (using the functions ). Formally, this slice operation is defined as    

4.8 Node-delete

The n-Delete-operation over a graphoid, deletes all nodes of a certain type and delete, in the source and target set of all edges, the nodes of this type. Again, although this operation is not present in classic OLAP, it is needed to simulate the classic OLAP slice operation, as will become clear in Section 5.2.

Definition 12 (Node-delete)

Assume a -graphoid is given as . Given a node type , the node-delete over operation, denoted produces a subgraphoid of , whose nodes of type are deleted, and such that all edges are replaced by edges , where and are and , respectively, minus the nodes of type . The edges remain of the same type and they keep the same label.    

Example 10

When a graphoid contains only nodes of one type, as in Figure 3, the result of the deletion of a node is, obviously, the empty graph. In the graphoid of Figure 9 (explained later), the result of would be a graph with nodes 2 and 3, where a hyperedge containing only these nodes would remain, with label .    

5 Classical OLAP Cubes as a Special Case of OLAP Graphs

This section explains how the classical cube-based OLAP model can be represented in the graphoid OLAP model. It is also shown that the classical OLAP-operations Roll-Up, Drill-Down, Slice and Dice can be simulated by the graphoid OLAP-operations defined in Section 4.

5.1 A Discussion on Modelling Cubes as Graphoids

Figure 8 illustrates a typical example of an OLAP cube with dimensions The cube represents sales amounts of products at certain stores locations (cities) on certain dates (at the lowest level of granularity). There are several ways for representing this cube in the graphoid model. Figure 9 shows two ways of modelling the fact , which expresses that the sales of Lego in the Antwerp store on January 1st, 2014 amount to 10.

Figure 8: An example of a Sales data cube with one measure: .

Figure 9(a) shows nodes 1, 2 and 3, of types , and , respectively. All of them have only one attribute, to store the values , and , call those attributes ProductVal, LocationVal and TimeVal, respectively. Further, those attributes are dimensions, with an appropriate dimension schema. The measure information is stored in the hyperedge with label , which has one attribute, namely SalesVal, to store the sale amount (10, in this case). Thus, in this approach, each cell of a data cube is modelled by a “star”-shaped hyperedge.

A more compact representation is shown in Figure 9(b). Here, there is only one node, of type in the graphoid, which represents the data cube. This node is labelled , and has no attribute values (apart from an identifier value). Cell-coordinates and cell-content are stored in hyperedges that form loops around the node. The fact is modelled by a unique hyperedge with label . Thus, cube facts are represented by a hyperedge of type that has four attributes: ProductVal, LocationVal, TimeVal and SalesVal.

Figure 9: Star-representation of the fact