1 Introduction
Graphs have been widely used to represent realworld entities and relationships. Realworld graphs often contain spatiotemporal information generated by a wide range of hardware devices (e.g., sensors, POS machines, traffic cameras, barcode scanners) and software systems (e.g., web servers). We describe three representative application scenarios in the following.
Application 1: Customer Behavior Tracking and Mining. Understanding customer behaviors is helpful for detecting fraud and providing personalized services. For example, credit card companies track customers’ credit card uses for fraud detection. Internet companies track users’ browsing behaviors to achieve personalized recommendations. In a customer behavior tracking and mining application, people (customers) and locations can be modeled as graph vertices, while an edge linking a person vertex to a location vertex represents the event that the person visits the location at a certain time, as shown in Fig.fig. 1. This forms a spatiotemporal graph. People visiting similar locations at similar timestamps often have similar personal interests. In other words, it is desirable to discover groups of people vertices that have similar edge structures in the spatiotemporal graphs.
Application 2: ClonePlate Car Detection. A cloneplate car displays a fake license plate that has the same license number as another car of the same make and model. In this way, the owner can avoid annual registration and insurance fees, and/or purchase the car without going through license plate lottery (which is a measure to address the traffic jam problems in major cities in China). However, it is difficult to detect a clone plate since a query to the car registration database will return a valid result. A promising approach is to exploit the large number of traffic cameras on high ways and local roads to detect clone plates. As a car passes by a traffic camera, the camera takes a photo and automatically recognizes the license number on the car plate. Car plates and cameras can be modeled as vertices in a spatiotemporal graph. An edge connecting a car plate vertex to a camera vertex indicates that the camera records the car plate at a certain time. Then, a clone plate is detected if a car plate vertex has two edges whose locations are far apart but timestamps are close such that it is impossible for the car to cover the distance in such a short period of time.
Application 3: Shipment Tracking. Recent years have seen rapid growth of the shipping business. Ecommerce sites, such as Amazon and Alibaba, have become increasingly popular. Customers place orders online and the ordered goods are delivered to their door steps by shipping companies. Each shipment package contains a barcode. Shipping companies track packages by scanning the barcodes on packages with barcode scanners in regional or local offices. The problem of shipment tracking can be modeled as a spatiotemporal graph. Packages and barcode scanners are represented as vertices. The event that a barcode scanner detects a package is recorded as an edge between the package vertex and the scanner vertex. In this way, the spatiotemporal graph can be used to track shipment and answer statuschecking queries.
Challenges for Storing and Querying SpatioTemporal Graphs. Based on the commonalities of the three applications, we define a formal spatiotemporal graph model, where a graph consists of location vertices (e.g., locations, cameras, barcode scanners), object vertices (e.g., people, car plates, packages), and event edges that connect them. Compared with static graphs, spatiotemporal graphs pose more significant challenges in data volume, data velocity, and query processing:

Data Volume: 10 billion object vertices, 10 million location vertices, and 100 trillion edges. First, the requirement of 10 billion object vertices is based on the fact that there are about 7.5 billion people in the world. Second, according to booking.com, there are about 1.9 million hotels and other accommodations in the world. Suppose there are 5 times more shops, restaurants, and theatres than hotels. Then the total number of locations can be on the order of 10 million. Finally, suppose an object vertex sees about 10 new edges per day. If we store the edges generated in the recent 3 years in the graph, there can be about 100 trillion edges in the graph. The data volume of a spatiotemporal graph can be much larger than static graphs.

Data Velocity: up to 1 trillion new edges per day. Compared with static graphs, spatiotemporal graphs must support a large number of new edges per day. Suppose there are an average of 10 new edges per object vertex daily, and peak cases (e.g., Black Friday shopping) can see ten times more activities. This leads to a peak velocity of 100 new edges per object vertex daily, i.e. 1 trillion new edges in total per day.

Query Processing: prohibitive communication cost. The data volume and velocity challenges entail a distributed solution that stores the graph data on a number of machines. However, without careful designs, processing queries can easily lead to significant crossmachine communication. For example, in order to find subgroups of people with similar interests (i.e. visiting similar locations at similar time), it is necessary to combine the spatial information in location vertices, the temporal information in edges, and properties of person vertices in the computation. As the data volume is huge, the crossmachine communication cost can be prohibitively high.
However, existing graph partitioning solutions [28, 4, 37, 44] focus mostly on static graphs, trying to minimize the number of cut edges across partitions and balance the partition sizes. Unfortunately, these solutions do not take into account spatiotemporal characteristics, which are important for query processing in spatiotemporal graphs. On the other hand, storing and querying spatiotemporal graphs using stateoftheart distributed graph database systems (e.g., JanusGraph [21]), MPP relational database systems (e.g., Greenplum [16]), big data analytics systems (e.g., Spark [39]), or Hadoop enhanced for spatiotemporal data (i.e. STHadoop [3]) result in poor performance (cf. Section section 8).
Our Solution: PAST. In this paper, we propose and evaluate PAST, a framework for efficient PArtitioning and query processing of SpatioTemporal graphs. We propose diversified partitioning for location vertices, object vertices, and edges. We exploit the multiple replicas of edges to design spatiotemporal partitions and keytemporal partitions. Then we devise a highthroughput edge ingestion algorithm and optimize the processing of spatiotemporal graph queries. Experimental results show that PAST can successfully address the above challenges. It improves query performance by orders of magnitude compared to stateoftheart solutions, including JanusGraph, Greenplum, Spark, and STHadoop.
Contributions. The contributions of our work are threefold:

We define a formal model for spatiotemporal graphs, and examine the design goals and challenges for storing and querying spatiotemporal graphs.

We propose PAST, a framework for efficient PArtitioning and query processing of SpatioTemporal graphs. It consists of a number of interesting features: (i) diversified partitioning for different vertex types and different edge replicas; (ii) graph storage with compression to reduce storage space consumption; (iii) a highthroughput graph ingestion algorithm to meet the challenge of data velocity; (iv) efficient query processing that leverages the different graph partitions; and (v) a cost model to choose the best graph partitions for query evaluation.

We compare the efficiency of PAST with stateoftheart solutions, including JanusGraph, GreenPlum, Spark, and STHadoop in our experiments. We design a benchmark based on the query workload of the representative applications. Experimental results show that PAST achieves orders of magnitude better performance than stateoftheart solutions.
Paper Organization. The remainder of the paper is organized as follows. Section section 2 reviews related literature. Section section 3 presents a formal definition of spatiotemporal graphs. Section section 4 overviews the system architecture of PAST. Section section 5 elaborates the partitioning and storage scheme for spatiotemporal graphs. Section section 6 describes the highthroughput edge ingestion support in PAST. Section section 7 presents optimizations on spatiotemporal query processing. Section section 8 presents experimental results. Section section 9 discusses several interesting issues. Finally, Section section 10 concludes the paper.
2 Related Work
We consider four solutions in the context of spatiotemporal graphs in Section section 2.1, and discuss more related work in Section section 2.2.
2.1 Supporting SpatioTemporal Graphs with Existing Solutions
Distributed Graph Database Systems. Graph database systems (e.g., JanusGraph [21], Titan [43], Neo4j [30], SQLGraph [41], ZipG [24]) often support the property graph data model. We consider distributed graph database systems (e.g., JanusGraph [21], Titan [43]) for supporting spatiotemporal graphs. They store vertices and edges in KeyValue stores (e.g., Cassandra [8], HBase [18]) and exploit search platforms (e.g., Elasticsearch [12], Solr [38]) as indices for selective data accesses. Simple graph traversal queries can be efficiently handled. However, they are inefficient for largescale spatiotemporal graphs because (i) they do not support direct filtering on time or spatial ranges, which are frequently used in spatiotemporal queries, and (ii) they need to scan the entire graph then invoke big data analytics systems (e.g., Spark [39]) for complex queries, which incurs huge I/O overhead.
MPP Relational Database Systems. We can store graph vertices and edges as relational tables in MPP database systems (e.g., Greenplum [16]), and use SQL for querying spatiotemporal graphs. Greenplum supports partitioning on multiple subsequent dimensions. The data is first partitioned by the first partition dimension. Then the second partition dimension is applied to each firstlevel partition to obtain a set of secondlevel partitions, so on and so forth. The partitions in all dimensions form a tree. MPP database systems can be inefficient for spatiotemporal graphs because (i) there is no support for spatial partitions, and (ii) any query has to start at the rootlevel and follow the tree even if the query does not have filtering predicates on the first partition dimension, potentially incurring significant CPU, disk I/O, and communication overhead.
Bigdata Analytics Systems. Generalpurpose bigdata analytics systems (e.g., Hadoop [17], Spark [39]) support largescale computation on data stored in underlying storage systems, such as distributed file systems (e.g., HDFS), distributed keyvalue stores (e.g., Cassandra [8]). We can store the spatiotemporal graphs in the underlying storage systems, and run computation jobs on bigdata analytics systems for querying spatiotemporal graphs. However, the system needs to load the entire graph before processing. It is not suitable for simple graph traversal queries. Since there is no support for filtering on time or spatial ranges on the underlying graph data, complex queries can see large unnecessary overhead due to reading the entire graph.
STHadoop. STHadoop [3] is an extension to Hadoop [17] and SpatialHadoop [13]. It represents previous studies that exploit multidimensional indices for supporting spatiotemporal data [31, 6, 34, 1, 26], and provides a scalable solution when data volume is large. STHadoop organizes data into twolevel indices. The first level is based on the temporal dimension, while the second level builds spatial indices. In this way, STHadoop reduces data accessed for queries with temporal and spatial range predicates, thereby achieving better performance than Hadoop and SpatialHadoop. However, there are several disadvantages of STHadoop for supporting spatiotemporal graphs. First, STHadoop sacrifices storage for query performance. STHadoop replicates its twolevel indices into multiple layers with different temporal granularities (e.g., day, month, year). In each layer, the whole data set is replicated and partitioned. Second, STHadoop is inefficient in supporting streaming data ingestion. It needs to sample across all data
to estimate the data distribution and compute temporal and spatial boundaries. This basically requires temporarily storing incoming data and then periodically shuffles the data to build the indices. Depending on the temporal granularity, this may incur large temporal storage overhead and huge bursts of computation. Third, there is no support for indexing vertex IDs. Consequently, simple graph traversal queries may incur significant I/O overhead for reading a large amount of data. Finally, STHadoop is based on the MapReduce framework, where the intermediate results are written to disks, potentially incurring significant disk I/O overhead.
2.2 Other Related Work
There are a large number of static graph partitioning algorithms in the literature, such as METIS [22], Chaco [4, 19], Scotch [32], PMRSB [5], ParMetis [23], PtScotch [9]. Several recent studies introduce lightweight algorithms for partitioning largescale dynamic graphs [20, 45, 40, 29]. However, none of these proposed methods leverage spatiotemporal characteristics of the data, which are important for query processing in spatiotemporal graphs.
Previous bipartite graph partitioning algorithms [10, 14]
focus on simultaneous clustering with spectral coclustering by computing eigenvalues and eigenvectors of Laplacian matrices. However, spatiotemporal graphs often have billions of vertices and trillions of edges, causing huge matrix storage and calculation overhead.
Time series databases (e.g., TimescaleDB [42], LittleTable [36]) enhance relational databases for supporting timeseries data by partitioning rows by timestamp. In addition, every partition can be further sorted / partitioned by a specified key. However, there is no efficient support for spatial range predicates, which are important for spatiotemporal graphs.
Spatiotemporal graphs have been employed for video processing [25]. A video consists of a series of frames. Here, a graph vertex corresponds to a segmented region in a video frame. A spatial edge connects two adjacent regions in a frame, while a temporal edge links two corresponding regions in two consecutive frames. In this paper, we define a spatiotemporal graph model based on the representative applications. It is a bipartite graph, which is quite different from the model in the video processing context.
In our previous work, we proposed LogKV [7], a highthroughput, scalable, and reliable event log management system. The ingestion algorithm of PAST is an extension of the ingestion algorithm of LogKV. However, LogKV focuses mainly on temporal events. There is no support for spatial range predicates. Therefore, LogKV cannot efficiently process the spatiotemporal query workload considered in this paper. Moreover, a preliminary fourpage version of this paper overviews the highlevel ideas and shows preliminary experimental results [11].
3 Problem Formulation
In this section, we present the formal definition of spatiotemporal graphs and the query workload, then examine the design challenges.
3.1 Spatiotemporal Graph
Based on the representative applications in Section section 1, we define spatiotemporal graphs as follows:
Definition 1 (Spatiotemporal Graph)
A spatiotem poral graph . is a finite set of location vertices. Every location vertex contains a location property. is a finite set of object vertices that represent objects being tracked. Every vertex in and is assigned a globally unique vertex ID. is a set of undirected edges. Every edge in connects an object vertex to a location vertex, and contains a time property.
Examples of location vertices include locations that customers visit in customer behavior tracking and mining, traffic cameras in cloneplate car detection, and barcode scanners in shipment tracking. Examples of object vertices include people in customer behavior tracking and mining, car plates in cloneplate car detection, and packages in shipment tracking. Every location vertex contains a location property such that given two location vertices and , their distance is well defined. For example, if the application is concerned about geographic locations, then the location property consists of the latitude and longitude of the location vertex. An edge contains object ID, timestamp, location ID, and other applicationdependent properties.
In essence, a spatiotemporal graph as defined in Definition 1 is an undirected bipartite graph. We do not consider edges between object vertices and edges between location vertices. This abstraction captures the key characteristics and the main challenges of the three representative applications.
3.2 Query Workload
We consider the following four types of queries based on the representative applications:

Q1: object trace. Given an object and a time range, find the list of (object, timestamp, location)’s that represent the locations visited by the object during the time range. For example, Q1 can display the trace of a shipment package or the activities of a customer in a specified period of time.

Q2: trace similarity. Given two objects and a time range, compute the similarity of the two object traces during the time range. Consider edge (, , ) in object ’s trace and edge (, , ) in object ’s trace. The two edges are considered similar if and , where and are predefined thresholds on time and location distance, respectively. The similarity of the two traces is the count of similar edge pairs in the two traces.

Q3: similar object discovery. Given an object and a time range, list the objects that have similar traces compared to . Display the list in the descending order of trace similarity with . Q3 can be used to discover people with similar interests in the customer behavior tracking and mining application.

Q4: clone object detection. Given a time range, discover all the clone objects. An object is a clone object if there exists two incident edges (, , ) and (, , ) such that the computed velocity is beyond a predefined threshold: . Q4 supports cloneplate car detection. It can also be used to detect duplicate credit cards by a credit card company in the customer behavior tracking and mining application.
3.3 Understanding the Challenges
Data Volume. The goal is to support 10 billion object vertices, 10 million location vertices, and 100 trillion edges. Suppose the properties of a vertex require at most 100B. Then the object vertices and the location vertices require about 1TB and 1GB space, respectively. An edge contains at least (object ID, timestamp, location ID). Suppose each field takes 8B. Then an edge takes at least 24B. 100 trillion edges require 2.4PB space. The 3replica redundancy policy requires a total 7.2PB storage space. Suppose the disk capacity of a machine is about 10TB. Therefore, 100 trillion edges require on the order of 1000 machines to store.
Data Velocity. The goal is to support up to 1 trillion new edges per day. This means 1Tx24B = 24TB/day of new ingestion data. A day consists of 86,400 seconds. Thus, this requires the design to support 290MB/s ingestion throughput.
Query Processing. We would like to minimize crossmachine communication in query processing. A random partition scheme would work poorly for Q1–Q4 because a large amount of unrelated data need to be visited. Since the four queries process time ranges, object IDs, and spatial locations, it is desirable to organize the data for efficient accesses using these dimensions.
4 PAST Overview
We propose PAST, a framework for efficient PArtitioning and query processing of SpatioTemporal graphs. The system architecture of PAST is shown in Fig.fig. 2. PAST consists of multiple machines connected through the data center network in a data center. We assume that the latencies and bandwidths of the data center network are much better than those in wide area networks. There is a coordinator machine and a large number of (e.g., 1000) worker machines. The coordinator keeps track of meta information of the graph partitions and coordinates data ingestion. The workers store the graph data, handle incoming updates, and process queries.
Graph Partitioning and Storage. The GraphStore component in Fig.fig. 2 implements PAST’s graph partitioning methods and supports compressed storage of the main graph data. The GraphStore utilizes an underlying DB/storage system (e.g., the Cassandra keyvalue store in our implementation).
We propose diversified partitioning for location vertices and object vertices because they have drastically different characteristics. The number of location vertices is 1/1000 of that of object vertices. As the spatiotemporal graph is a bipartite graph, the average degree of a location vertex is 1000 times of that of an object vertex. As a result, they have very different impact on the communication patterns in query processing.
Edges consume much more space than vertices, and are often the performance critical factor in query processing. All four queries filter the edges with a given time range. Therefore, GraphStore should support time range filters efficiently. Q1, Q2, and Q4 access edges for a given object, two objects, and all objects, respectively. Hence, it would be nice to organize edges according to object IDs. On the other hand, Q3 can be more efficiently computed if edges are stored in spatiotemporalaware orders so that GraphStore can filter out a large number of edges that are not relevant to the trace of the specified object. However, these requirements seem contradicting. We solve this problem by taking advantage of the multiple replicas of edges. For the purpose of fault tolerance, PAST stores multiple replicas for edges (e.g., 3 replicas). Therefore, we propose a spatiotemporal edge partition method and a keytemporal edge partition method for different edge replicas.
Highthroughput Streaming Edge Ingestion. New edge updates are streamed in rapidly. As shown in Fig.fig. 2, the IngestStore component maintains a staging buffer for incoming new edges. All the worker machines handle incoming edges in rounds. They keep loosely synchronized clocks. In every round, the IngestStores collect the incoming edges in the current round. At the same time, IngestStores send edges collected in the previous round to their destination GraphStores based on the partitions computed by PAST’s partition methods. We design an efficient algorithm to perform the data ingestion that avoids hot spots in the data shuffling.
Query Processing and Optimization. Given PAST’s partition methods, we design a costbased query optimizer to choose the best partitions for an input query. Our goal is to reduce crossmachine communication and edge data access as much as possible. The edge partitions divide the spatiotemporal space and keytemporal space into discretized blocks. We perform blocklevel filtering to avoid reading irrelevant edge data. Then we also take advantage of triangle inequalities for finergrain filtering if geographic locations are used in the application.
5 Graph Partitioning and Storage
In this section, we describe the partitioning and storage methods for spatiotemporal graphs in PAST.
5.1 Diversified Partitioning
There are two main approaches to graph partitioning in the literature: vertexbased partitioning and edgebased partitioning. In vertexbased partitioning [22], a vertex is the basic partitioning unit. It assigns vertices along with their incident edges to partitions in order to minimize crosspartition edges. However, for a highdegree vertex, which is common in realworld graphs, there will be a large number of crosspartition edges no matter which partition to assign it. To address this problem, edgebased partitioning [15] assigns edges to partitions. If the incident edges of a (highdegree) vertex are in partitions, then the scheme chooses one partition to store the main copy of the vertex, and creates a ghost copy of the vertex in every other partition, thereby reducing the number of crosspartition edges to ghosttomain virtual edges.
We propose diversified partitioning for spatiotemporal graphs. Our solution is inspired by edgebased partitioning. It considers the different properties of location vertices, object vertices, and edges, and the characteristics of spatiotemporal graph queries.
Location Vertex. The 10 million location vertices require about 1GB space (cf. Section section 3.3). A midrange server machine today is often equipped with 100GB–1TB of main memory. Therefore, all the location vertices can easily fit into the main memory of one machine. On the other hand, location vertices are frequently visited for obtaining the locations of specified location IDs. Therefore, PAST stores all location vertices in every worker machine, and loads them into main memory at system initialization time. In this way, location information can be efficiently accessed locally without crossmachine communication. PAST updates all the worker machines when a location is updated. The update cost is insignificant because location vertices (e.g., shops, hotels, traffic camera locations, shipping services) change very slowly,
Object Vertex. PAST performs hashbased partitioning for object vertices. The scheme is inspired by Redis [35]. Each object vertex is assigned a unique 8byte vertex ID. We first divide the vertices into slots. There are slots. Given a vertex ID , PAST computes , where is a bit slot ID. Then, we assign the slots to the worker nodes in a round robin manner: , where is the number of worker nodes. For fault tolerance purpose, we create 3 replicas by storing the slot to worker , , and .
Edge. An edge contains a triplet (object ID, timestamp, location ID). Generally speaking, queries in spatiotemporal applications often contain filtering predicates on object IDs, time ranges, and locations. Specifically, Q1Q4 all have time range filters. Q1, Q2, and Q4 will benefit from a data layout where edges of each object are stored together, while the amount of data accessed by Q3 is reduced if spatiotemporal filtering is efficiently supported. We take advantage of the multiple edge data replicas to design a twofold partitioning strategy as described in Section section 5.1.1 and section 5.1.2.
5.1.1 Skewaware Spatiotemporal Edge Partitioning
Spatiotemporal edge partitioning first partitions edges according to the spatial dimension so that given an edge , all edges associated with locations near are likely to be in the same partition. Then within a spatial partition, it constructs temporal subpartitions, each of which contains edges in a disjoint time range. For the second part, it is straightforward to sort the edges by time and obtain the temporal subpartitions. Therefore, we focus on the spatial partitioning part of the design in the following.
We divide the universe of locations (e.g. the map) into regions by applying a grid. Each grid cell is a region. For simplicity, we consider square cells in this paper and denote the cell width . Let region ’s weight be the number of locations in region . The distribution of
can be very skewed. For example, there are usually more locations in regions with higher population. More object vertices (e.g., people) may visit such regions, leading to higher number of event edges. Therefore, we assume that
is proportional to the number of edges incident to location vertices in region . We choose cell width such that (i) , where is the distance threshold in Q2 and Q3; and (ii) regions are small compared to the map, i.e. , where (e.g., =0.001).We would like to assign regions to worker machines^{1}^{1}1 Every worker machine stores a partition. Therefore, partition and worker machine are used interchangeably here.
to achieve three goals: (i) balance edge storage space across the worker machines; (ii) ensure that adjacent regions are on the same machine with large probability for reducing communication cost of evaluating spatial predicates; and (iii) enable multiple workers to evaluate a query in parallel for better performance.
In what follows, we propose an UnboundedSpatioMapping algorithm (Alg.algorithm 1) that achieves the first two goals but fails for goal (iii). Then we design a BoundedSpatioMapping algorithm (Alg.algorithm 2), and compute the algorithm parameter for achieving goal (iii).
Alg.algorithm 1 lists the UnboundedSpatioMapping algorithm. To achieve goal (ii), it employs the Zorder curve. It sorts the regions by their Zcodes (Line 2), and assigns regions in contiguous Zcode intervals to machines (Line 5–10). In this way, neighboring regions are assigned to the same machine with large probability. To achieve goal (i), the algorithm computes region weights (Line 4). The weight assigned to a machine is sum, and sum (Line 8). Therefore, sum . The permachine weight is only slightly (e.g., =0.001) above average machine weight. However, Alg.algorithm 1 fails to achieve goal (iii). For example, in Q3, we can retrieve the list of edges of the given object , then use the spatiotemporal edge partition to find all edges similar to any edge in . Suppose that the locations that has visited lie in an area (e.g., California) that contains a large number of regions. However, Alg.algorithm 1 may assign the entire to a single machine (e.g., our experiments use 10 worker machines). As a result, the partitions do not allow multiple workers to process the task in parallel.
Alg.algorithm 2 addresses this problem by assigning units of adjacent regions to workers. It is likely that area contains multiple units and the units are assigned to different workers. In this way, multiple workers can process Q3 in parallel. However, in doing this, Alg.algorithm 2 may introduce more crossmachine communication for locations near unit boundaries, and the storage space is less balanced because the smallest assignment is a unit of regions.
We derive constraints on parameter for Alg.algorithm 2 to satisfy all three goals. Consider goal (i). It is easy to show that the weight assigned to a worker is less than . Suppose we limit the largest skew . If , then .
Next, we consider goal (ii) and (iii). We call location a nearby location of iff. their distance , where . We use Euclidean distance here.
We define SF (spatial factor) to be the probability that a nearby location of a location
resides in the same unit. Suppose locations are uniformly distributed in a unit. Theorem
creftype 1 computes SF.Theorem 1
Given a location in unit , is a nearby location of , then .
We divide a unit into 9 parts, as illustrated in Fig.fig. 3a. is an inner square. . is a rectangle. There is a single unit adjacent to . We consider the four similarly. . is a square. There are three units (including the one in the diagonal position) adjacent to . We also consider the four similarly. .
We derive bounds on the probability by using Chebyshev distance . Note that Euclidean distance . Given a location in (), Fig.fig. 3b (Fig.fig. 3c) illustrates the shadow area outside of the unit where its nearby location may reside in. We have
Then,
Lemma 1
To achieve a target spatial factor , one can set , where .
From Theorem creftype 1, we see that if , then and the target is achieved.
The valid solution to this inequality is , if . If is smaller than , we set . The lemma combines the two cases.
5.1.2 Keytemporal Edge Partitioning
Keytemporal edge partitioning first constructs key partitions to store edges of objects in the same slot. It follows the partitioning method of object vertices to obtain the key partitions. That is, given an edge (object ID, timestamp, location ID), it uses object ID to compute the slot ID and then maps to worker . Then within a key partition, it constructs temporal subpartitions. Similar to spatiotemporal partitioning, it divides edges into disjoint time ranges, each of which constitutes a subpartition.
5.2 Compressed Columnar Edge Store
In this section, we focus on the storage of edges in a worker machine node. GraphStore exploits existing storage / DB systems (e.g., Cassandra in our implementation) as the underlying DB to store edge data.
GraphStore organizes edge data into (row key, edge data) pairs, where the row key uniquely identifies a spatiotemporal / keytemporal subpartition and the edge data contain compressed list of (object ID, timestamp, location ID, other edge properties) in the subpartition. The row key is a concatenation of the following fields:

node id: uniquely identifies the worker node^{2}^{2}2 Our implementation modifies the partitioning function of Cassandra to identify the node id field in row keys for computing Cassandra partitions.;

partitioning method: ‘A’ for spatiotemporal partitioning and ‘B’ for keytemporal partitioning;

partitionspecific id: region ID in spatiotemporal partitioning or slot ID in keytemporal partitioning;

time range: TimeRange = , where TRU (Time Range Unit) is a configuration parameter used to discretize time.
For the edge data, we employ columnar layout for the attributes then compress the columns. Columnar layout is attractive because (i) it has good compression ratio and (ii) a query needs to uncompress and access only the relevant columns. For example, we put the object IDs of all edges in the subpartition in an array, then compress the array. Similarly, we obtain the compressed representations of timestamps, location IDs, and other edge properties if exist. We measure the compression ratios and efficiency of several wellknown compression algorithms, and choose LZ4 and Snappy in PAST because of their good performance [33].
To store the columns in Cassandra, we have two implementations: (i) C: we create multiple tables (or column families) in Cassandra, and store the compressed column of each edge attribute in a separate table; and (ii) R: we concatenates all the compressed columns into a single binary value, and store the value in a single Cassandra table. R essentially implements the PAX layout [2]. We evaluate the two implementations experimentally.
5.3 Fault Tolerance
PAST maintains edge replicas using different partitioning methods. However, this might cause two replicas of the same edge to reside in the same physical machine.
In our design, we check if such situation occurs and store the edge to another machine to ensure replicas are on different machines. Suppose an edge is assigned to worker in spatiotemporal partitioning, and to in keytemporal partitioning. The problem occurs if =. If the condition is true, then PAST will store a copy of the edge to worker .
We compute the extra space required. Suppose that there are worker nodes. Edges are evenly distributed across the workers. Then the probability that an edge is assigned to a worker is . The probability that = is . Therefore, the additional storage incurred is of the total edge data size. When is large (e.g., 100–1000), the extra space required is negligible.
6 HighThroughput Edge Ingestion
The data velocity challenge requires PAST to support up to 1 trillion new edges per day. We break down this goal into three subgoals: (i) store up to 1 trillion new edges per day; (ii) achieve the proposed partitioning and storage strategy for the new edges as described in Section section 5; and (iii) balance the ingestion workload across the worker machines and avoid hot spots as much as possible. While directly reflecting the desired ingestion throughput, subgoal (i) is not sufficient by itself. The other two subgoals are important because subgoal (ii) supports sustained ingestion performance and enables query processing on new edges, and subgoal (iii) improves the scalability of the system.
We propose a highthroughput edge ingestion algorithm as described in Alg.algorithm 3, which extends our previous work on event log processing [7] to support diversified partitioning for spatiotemporal graphs. The highlevel picture of the algorithm is as follows. IngestStore on each machine buffers the incoming new edges before shuffling them to GraphStore to implement the desired partitioning strategy. IngestStore employs double buffering. It buffers TRU worth of data while shuffling the previous TRU worth of data to GraphStore on the destination workers at the same time. is a parameter to tolerate instant ingestion bursts at individual IngestStores and avoid hot spots in shuffling.
Alg.algorithm 3 consists of three functions. The first function, IngestStore_AppendNewEdge, is invoked by IngestStore upon receiving a new edge. It appends the new edge to the end of inbuf (Line algorithm 3).
Every TRU time, the coordinator initiates a new round of shuffling operation by broadcasting a NextRound message with a parameter to all workers. Then IngestStore at each worker invokes the second function, IngestStore_ComputePartition, to compute the partitions for edges in and copy the edges to outbuf. Note that the coordinator can initiate the round at time in order to tolerate communication delays from event sources for up to time. The function computes the spatiotemporal partition (Line 7–9) and the keytemporal partition (Line 10–12) for an edge. Then it checks if the destination workers of the spatiotemporal and the keytemporal partitions collide. In such cases, it copies the edge to the outbuf of the next worker as discussed in Section section 5.3 (Line 15–16). In the end, the function truncates the inbuf. After the invocation, IngestStore replies a PartitionDone message to the coordinator. Note that the copy operation is very similar to the partitioning step in the inmemory partitioned hash join algorithm. When the number of destination workers is large, there can be significant TLB and cache misses. We perform multipass copying in the spirit of the radixcluster algorithm [27] for better CPU cache performance.
When all the partitions are computed at all workers, the coordinator broadcasts a Shuffle message to all workers. Then GraphStore at each worker invokes the third function, GraphStore_Shuffle. It randomly permutes the worker list, then attempts to retrieve edges from IngestStore at every worker in the list (Line 21–23). If is busy serving another GraphStore, then it puts into the busy list (Line 22). After processing , the function repeatedly processes workers in the busy list until the busy list is empty (Line 24–29). Upon receiving edge data from all workers, GraphStore sorts and compresses the edges, then stores the compressed data as described in Section section 5.2 (Line 30).
We reexamine the three subgoals. It is clear that subgoal (ii) is achieved by Ingest_StoreComputePartition and GraphStore_Shuffle. For subgoal (iii), the double buffering mechanism and the random permutation are designed to reduce hot spots as much as possible. For subgoal (i), we measure the ingestion throughput experimentally in Section section 8.
7 Spatiotemporal Graph Query
Processing and Optimization
edge size  time range of a query  
#edges ingested per second  #regions or #slots accessed by query  
#regions or #slots in total  
cost for reading from disk  network communication factor  
cost for invoking backend  degree of parallelism  
penalty factor (0,1)  edge data size in a TRU: TRU  
#TRU covered by a query’s time range:  

spatiotemporal (ST)  keytemporal (KT)  

Q1  
Q2  
Q3  
KT+ST  
Q4 
Spatiotemporal query processing in PAST has two distinctive features compared to query processing in existing systems. First, there are two edge partitions: spatiotemporal and keytemporal. We study the cost models for choosing edge partitions in Section section 7.1. Second, there are predicates on edge similarity, which requires similarity joins on the spatial and/or temporal dimensions. We optimize the evaluation of edge similarity in Section section 7.2.
7.1 Costbased Partition Selection
Query processing can exploit the two types of edge partitions to skip accessing a large amount of unrelated data and to reduce crossmachine communication overhead. In this subsection, we derive a cost model for the queries.
Table table 2 lists terms used in the model. Among the terms, , , , , and are constants. is the total number of regions (slots) in spatiotemporal (keytemporal) partitioning. For example, there are 16384 slots and 1048576 regions in our experiments. The time range parameters , , and are determined by a given query. , , , and are dependent on both the given query and the chosen edge partition type.
The data size accessed by a query can be calculated as follows:
Then, we formulate the total cost for evaluating the query:
where is a penalty factor. The actual degree of parallelism is .
is the cost of reading amount of data from disk: . is the cost of invoking the underlying DB. PAST retrieves all subpartitions of every region / slot with a single invocation. Therefore, the number of invocations is , and . is the cost of communicating a fraction of the retrieved data across machines. . Here, parameter depends on the query evaluation strategy and the network bandwidth.
Given the cost model, we compute the cost of processing Q1–Q4 using different partitions. Table table 2 summarizes the computed costs. We set to be 1048576 and 16384 for spatiotemporal and keytemporal partitions, respectively. The degree of parallelism (), the network communication factor () and region/slot proportion () differ as different partitions are selected for query execution. There are 10 workers in our experiments. Therefore, .
Q1: object trace. (i) spatiotemporal: Since the given object may visit any spatial locations during the time range, every worker reads its own spatiotemporal partition to look for edges that contain object . Every region needs to be examined. No edge data shuffle is necessary. So .
(ii) keytemporal: Only the worker that contains the keytemporal partition of reads the ’s slot. There is no data shuffling during computation. So .
Q2: trace similarity. (i) spatiotemporal: Compared with Q1, the query processing strategy is unchanged. Now every worker looks for edges that contain either given objects. So the parameters are also unchanged. .
(ii) keytemporal: We perform Q1 then computed similarity between the obtained traces. In the worst case, the keytemporal partitions of the two given objects reside in two machines. So .
Q3: similar object discovery. Q3 is a heavyweight query. Both the spatiotemporal and the keytemporal solutions read all data in the time range. Then they perform a join and a groupby operation by shuffling all the retrieved data among workers. Therefore, the network communication factor () is very large. (i) spatiotemporal: . (ii) keytemporal: .
We design an optimized execution strategy for Q3 that combines the two partitions. (iii) keytemporal+spatiotemporal: It first obtains the trace of the given object by Q1 accessing the keytemporal partitions. Then it finds all locations visited by , computes the regions from the locations, and sends the locations to workers that store the relevant regions. After that, each worker reads the relevant regions from its spatialtemporal partition and looks for edges that are similar to ’s trace. Finally, similar edges are grouped by objects to compute the aggregate similarities, which are then sorted to obtain the query result. While the final step performs data shuffling, the amount of data shuffled is reasonably small compared to that in (i) and (ii). Therefore, the network communication factor . Suppose the number of regions to access is (), and all workers participate in the computation. So .
Q4: clone object detection. (i) spatiotemporal: All machines participate in computation. They have to shuffle all the data in the time range. Suppose the network communication factor is . Then for the same time range. Other parameters are .
(ii) keytemporal: Each machine reads every slot in the time range. Since all edges of an object are in the same slot, velocity computation can be performed locally without data shuffling. Thus, .
Comparison. From Table table 2, we see that for Q1, Q2, and Q4, KT’s cost is lower than ST’s cost: . Therefore, KT is the better partition to use. For Q3, . Therefore, ST+KT should be selected.
In general, for a given query, we can apply the above analysis to compute the cost for different partition types, and choose the best partition or partition combination based on the computed costs.
7.2 Edge Similarity Computation
Optimizing Location Computation. We would like to improve the efficiency of computing . The basic idea is to filter out faraway locations that cannot satisfy the inequality without computing the distance.
Given a set of locations in an area (e.g., a region or a unit). We apply a grid to the area, where a grid cell is a square and region width . It is easy to convert the coordinates of a location into grid coordinates : , . For and , their grid coordinates are and , respectively.
Let , , , and . Therefore, and . According to triangle inequality, we have the following:
Thus
We use the above inequalities to calculate the lower bound of the distance between two locations. If the lower bound is larger than , then we can avoid computing the actual distance, thereby reducing computation overhead.
Optimizing Time Computation. A subpartition contains events in a TRU. For computation, we can use the subpartition row key (which encode the time range) to reduce data access and computation overhead. Let . If is in time range , we only need to consider subpartitions with time ranges in . In particular, when TRU, we only need to consider three subpartitions with time ranges , , and .
8 Experimental Evaluation
In this section, we evaluate the performance of PAST. We would like to answer the following questions in the experiments:

Can PAST efficiently support highthroughput edge insertions while achieving the desired graph partitions?

What are the benefits of the proposed techniques, such as the diversified partition strategy, costbased edge partition selection, compressed edge storage, and computation optimization?

How does PAST compare to stateoftheart systems (e.g., JanusGraph, Greenplum, Spark, and STHadoop)?
8.1 Experimental Setup
Machine Configuration. The experiments are performed on a cluster of 11 machines. Each machine is a Dell PowerEdge blade server equipped with two Intel(R) Xeon(R) E52650 v3 2.30 GHz CPUs (10 cores/20 threads, 25.6MB cache), 128GB memory, a 1TB HDD and a 180GB SSD, running 64bit Ubuntu 16.04 LTS with 4.4.0112generic Linux kernel. The blade servers are connected through a 10Gbps Ethernet. We use Oracle Java 1.8, Cassandra 2.1.9, JanusGraph 0.2.1, Greenplum 5.9.0, Spark 2.2.0, STHadoop 1.2 and Hadoop 2.8.1 in our experiments.
Workload Generation. We generate a synthetic data set of customer shopping events. The goal to support 10 billion object vertices and 10 million location vertices are designed for clusters with 1000 machines. Given the machine cluster size in our experiments, we scale down the number of vertices by a factor of 100. Therefore, we would like to generate 100 million object vertices and 100 thousand location vertices.
First, we crawled 450 thousand hotel locations in China from ctrip.com (a popular travel booking web site in China). The locations are distributed across about 1100 areas (cities / counties / districts). We randomly choose 100 thousand locations from the realworld hotel locations as shopping locations. Second, we generate 100 million customers. As the number of shopping locations and the population of an area are often correlated, we assign customers to areas so that the number of customers is proportional to the number of locations in an area. Third, we would like to generate a data set that covers a 23 year period and can be stored in the cluster in the experiments. We assume 40% people are frequent shoppers and 60% people are infrequent shoppers. A frequent shopper and an infrequent shopper visit a randomly chosen shopping location in his or her area every week with probability 0.8 and 0.2, respectively. Edge timestamps are used to compute edge partitions and evaluate queries in our experiments. (Note that the week period is necessary to reduce the total data volume to fit into the cluster storage capacity. Our ingestion experiments will send the edge data as fast as possible to saturate the system disregarding edge timestamps). Finally, we produce a small number of cloned objects that visit shopping locations in faraway areas. The resulting graph contains 57 billion edges, which cover 800 days.
Parameter Settings. In spatiotemporal partitioning, we apply a grid to the map, obtaining 1048576 regions. Note that there are more hotels in the city than in the countryside. We see that hotels concentrate in 8660 regions. In keytemporal partitioning, we generate 16384 slots. We set TRU to be 24 hours.
We evaluate Q1Q4 as described in Section section 3.2. By default, we set , , .
Stateoftheart Systems to Compare. We evaluate our proposed solution, PAST, and four stateoftheart systems in the experiments:

PAST (our proposed solution). We implement PAST in Java. The GraphStore uses Cassandra as the underlying DB backend and stores data on the HDDs. We customize Cassandra’s partitioner to manage the keytonode mapping.

JanusGraph (a stateoftheart distributed graph store). A spatiotemporal graph is stored as a property graph. We choose Cassandra as JanusGraph’s storage backend. To facilitate edge retrieval for a given object (location) vertex, we set the vertex ID to be the object ID (location ID).

Greenplum (a stateoftheart MPP relational DB). Graph data is stored as relational tables in Greenplum. To minimize disk space usage, we create two tables for edge data and for location details with latitude and longitude, respectively. The two tables are linked by location ID. Q2, Q3 and Q4 perform join operations. We employ multidimensional partitioning for improving query performance. Vertex ID is the first dimension, and time is the second dimension.

Spark+Cassandra (a stateoftheart bigdata analytics system). Graph data is stored in Cassandra. Spark loads the location data into memory at the beginning of execution to reduce the overhead of looking up locations. The loading time is less than 1 second, and is negligible compared with the query execution time. Spark accesses data in Cassandra for query processing, and saves the query results to HDFS.

STHadoop+Spark (a big data system specially optimized for spatiotemporal data). Given the TRU setting in PAST, we set STHadoop’s partition granularity to be day. PAST’s spatiotemporal partitioning employs the Zorder curve. Therefore, we set STHadoop’s spatial index technique to be the Zorder curve. To avoid MapReduce’s overhead of storing intermediate data to disks, we use Spark to read spatiotemporally indexed data in STHadoop and compute the queries in memory.
8.2 New Edge Ingestion
In this section, we measure the sustained edge ingestion throughput in PAST, and examine the distribution of data across workers.
As described in Section section 6, PAST shuffles TRU worth of data in every round. In our experiments, we set to be the number of worker machines. Note that edges are partitioned during ingestion.
Edge Ingestion Throughput and Scalability. The edge ingestion throughput is the number of new edges streamed into all the GraphStores per second. We send edges in the data set as fast as possible in this set of experiments. We observe that the ingestion throughput stabilizes in the third round, as shown in Fig.fig. 4. Given this, we measure the ingestion throughput in the fifth and sixth rounds at IngestStores, and report the aggregate ingestion throughput as the sustained throughput in Fig.fig. 5.
In Fig.fig. 5, we vary the number of worker machines on the Xaxis from 1 to 10 to study the scalability of our solution. The Yaxis reports the sustained ingestion throughput in million edges per second. From the figure, we see that the sustained ingestion throughput increases nearly linearly as the number of worker grows. The PAST design achieves good scalability for new edge ingestion.
Every worker in PAST can support an additional 0.85 million new edges per second. An edge takes 24 bytes in this experiment. Every worker in PAST can support additional 20MB/s ingestion bandwidth. Therefore, the design goal of 1 trillion new edges per day (or 290MB/s) for a fullscale spatiotemporal graph can be achieved with about 15 worker machines. This gives a lower bound of the actual number of nodes in a design, whose choice must also consider the performance of query processing.
Edge Ingestion Throughput Evolving Over Time. We measure the number of ingested edges seen by both IngestStores and GraphStores at every minute for about 50 minutes. Fig.fig. 4
shows the average permachine throughput across all machines. The Xaxis is wallclock time. The Yaxis is the ingestion throughput in million edges per second. The error bars show the standard errors.
The upper figure in Fig.fig. 4 shows the ingestion throughput seen by IngestStores. IngestStores begin to handle incoming edges at time 0. The ingestion throughput quickly increases at the beginning. Then it fluctuates around 0.85 million new edges per second.
The lower figure in Fig.fig. 4 shows the ingestion throughput seen by GraphStores. GraphStores begin to receive data at time=360s. Its throughput reaches the peak with nearly 108MB/s (4.5 million edges per second) at time=480s. This is because Cassandra starts with empty inmemory buffers for receiving data, and the cost of storing to Cassandra is very low at the beginning. As time goes by, Cassandra moves into more steady states. The peak throughput reduces to 86MB/s (3.6 million edges per second) at the second round. The peak throughput in stable rounds (i.e. round 3 and beyond) is about 48MB/s (2.0 million events/s).
Note that IngestStores send both keytemporal data and spatiotemporal data. The shuffled data is essentially twice the amount of the incoming edge data. Therefore, the ingestion throughput seen at GraphStores is roughly twice as much as that seen at IngestStores.
Data Distribution. We would like to understand how well our partitioning methods work in terms of data distribution across machines. We measure the data size in each worker for both spatiotemporal edge partitions and keytemporal edge partitions. In Fig.fig. 6, the Xaxis shows each worker machine, while the Yaxis reports data size in GB. From the figure, we see that PAST’s partitioning methods achieve balanced data distribution. Note that the data size of spatiotemporal partitions is roughly twice as large as the data size of keytemporal partitions. This is because PAST stores three replicas for edge data, of which two in spatiotemporal partitions, and one in keytemporal partitions.
8.3 Proposed Features in PAST
In this section, we evaluate the proposed features in PAST, including storage format, edge partition selection, computation optimization, and the spatial mapping algorithms.
8.3.1 Comparison of Two Storage Formats
We evaluate the two storage formats as described in Section section 5.2: (i) : this is the baseline design, where edge properties are stored in separate tables (column families) with compression; and (ii) : this is the PAX design, where the compressed columns are concatenated to store in a single table (column family). Since they both store compressed column data, and consume the same amount of disk space. However, their query performance can be different.
Fig.fig. 7 reports query execution time for Q4 while varying the number of edge properties. Note that Q4 uses only three edge properties (i.e., object ID, timestamp, location ID) in computation. We perform the experiment on a single machine. The Xaxis shows the total number of edge properties, and the Yaxis is elapsed time.
From the figure, we see that the cost of decompression is negligible. When there are three edge properties, and take nearly the same time for reading data from Cassandra because they retrieve the same amount of data from disk. However, as the number of properties increases beyond three, the query performance of deteriorates. The read time in increases as the number of properties increases. This is because avoids accessing irrelevant properties, while has to retrieve all properties.
8.3.2 Benefits of CostBased Edge Partition Selection
We perform a set of experiments to validate the analysis in Section section 7.1. Here, we evaluate Q1, which represents simple graph traversal queries, and Q3, which represents complex queries, using different edge partitions. The query time range is one day.
Table table 3 lists the execution time and accessed data size for Q1 and Q3 using the following solutions. (i) ST: a query accesses only spatiotemporal partitions. (ii) KT: a query accesses only keytemporal partitions. (iii) KT+ST for Q3: the optimized execution strategy for Q3, where PAST first accesses keytemporal partitions for retrieving the trace of the given object , then accesses spatiotemporal partitions for computing objects that are similar to .
From Table table 3, we see that for Q1, , and for Q3, . This observation is in accordance with the analysis based on the cost model in Section section 7.1. Overall, the best execution plans for Q1 and Q3 achieve a factor of 39.5x and 9.7x improvements over the second best plans.
Q1  Q3  
replica  size (MB)  time (s)  size (MB)  time (s) 
KT  0.058  0.47  950  343.92 
ST  950  18.57  950  636.52 
KT+ST  –  –  9  35.63 
Note: only Q3 can be optimized using both two edge partitions. 
8.3.3 Benefits of Computation Optimization
We study the effect of optimization techniques for evaluating edge similarity, as described in Section section 7.2. We measure Q3’s execution time while varying the query time range from 1 day to 512 days. We compare three solutions: (i) STOP: the baseline implementation in PAST with both location and time computation optimized; (ii) TOP: only time computation is optimized; (iii) NOP: there is no computation optimization.
#day  1  2  4  8  16  32  64  128  256  512 

read(MB)  9  34  34  89  221  456  963  2000  4005  8134 
#result()  3  7  7  13  27  45  87  174  333  566 
Fig.fig. 8 reports query execution time in the logarithmic scale. Table table 4 lists the read data size and the number of query results for all the experiments in Fig.fig. 8. When the query time range is less than 4 days, the three solutions take nearly the same time. As the query time range increases, the performance difference becomes obvious. Overall, STOP outperforms TOP and NOP by a factor of up to 3.1x and 12.8x. The computation optimization is quite significant.
8.3.4 Effect of Spatial Mapping for Skewed Data
We compare unbounded spatial mapping (Alg.algorithm 1) and bounded spatial mapping (Alg.algorithm 2) using Q3. We consider three choices of b (i.e., 8, 4, and 2) for Alg.algorithm 2. According to Theorem 2, the corresponding lower bound of SF (spatial factor) is 88%, 77% and 56%, respectively. (we set region width to be ). The unbounded algorithm is labeled by . PAST employs the optimized execution plan for Q3 as described in Section section 7.1.
Fig.fig. (a)a shows the distribution of accessed regions for each machine when the query time range is 512 days. The smaller the SF is, the more machines are likely to contain regions used in the query. However, we would like to choose such that is not very small and data locality is maintained for most location computation. Fig.fig. (b)b shows the execution time for and . We see that the bounded algorithm achieves a small improvement of 10% over the unbounded algorithm.
8.4 Comparison with Stateoftheart Systems
We compare PAST with stateoftheart systems, including JanusGraph, Greenplum, Cassandra+Spark, and STHadoop+Spark. We are interested in two aspects: (i) storage space consumption and (ii) query performance.
8.4.1 Storage Space
Table table 5 shows storage space used in all systems^{3}^{3}3Data ingestion times for the systems are as follows. PAST: 6h, JanusGraph: 20h, Greenplum: 16h, Cassandra: 46h, STHadoop: 12h.. Note that PAST and Cassandra keep 3 replicas, and Greenplum stores 2 replicas by default. Due to disk space limitation, JanusGraph, Greenplum, and STHadoop stores only 1 replica.
PAST  JanusGraph  Greenplum  Cassandra  STHadoop  
size (TB)  1.9  3.4  2.8  3.2  3.1 
#replica  3  1  2  3  1 
From the table, we see that compared with Cassandra, PAST achieves a factor of 1.7x space savings. PAST consumes less space than the other systems even if they store fewer number of replicas. If we calculate the space consumption for JanusGraph, GreenPlum, and STHadoop to account for three replicas, then PAST reduces the space consumption of JanusGraph, GreenPlum, and STHadoop by a factor of 5.4x, 2.2x, and 4.9x, respectively.
Note that JanusGraph stores the graph data in Cassandra. Both Cassandra and JanusGraph employ LZ4 compression. In comparison, PAST achieves much better space consumption. This is because PAST utilizes columnar layout for all edges in a subpartition, and compresses the columnar edge properties. JanusGraph consumes more space than Cassandra because it stores each edge twice at both the incoming vertex and the outgoing vertex. PAST and Cassandra take less space to store one replica than Greenplum because of compression.
8.4.2 Query Performance
Fig.fig. 10 and Fig.fig. 11 compare the query performance for all systems, while varying the query time range and the threshold values, i.e. and . The Yaxis is executing time in the logarithmic scale. We do not run Q3 and Q4 on JanusGraph as it mainly focuses on simple traversal queries and employs Spark for complex queries. Therefore, Q3 and Q4 on JanusGraph can be represented by Q3 and Q4 on Cassandra+Spark, respectively.
There are several missing points in Fig.fig. 10(c) and (d). The experiments corresponding to the missing points run over one day and have not completed. Cassandra+Spark and STHadoop+Spark are overwhelmed by shuffling for Q3 and Q4 with large query time ranges. For Q4, most systems compute the velocity of edges of an object in sorted time order. In contrast, in GreenPlum, the SQL query for Q4 would read and join all data for computing velocity, which quickly overwhelm the system.
From Fig.fig. 10 and Fig.fig. 11, we see that PAST achieves 1–4 orders of magnitude better performance compared with the four existing solutions. The partition and query processing schemes in PAST can effectively reduce the amount of data accessed from the underlying storage and the data communication cost. The main bottleneck of Cassandra+Spark is the disk I/Os for scanning all data. STHadoop+Spark performs better than Cassandra+Spark because it exploits the spatiotemporal index to reduce the amount of data to read. However, as the query time range increases, STHadoop+Spark’s performance degrades and approaches that of Cassandra+Spark. Greenplum achieves significantly better performance than STHadoop+Spark for the simple queries, Q1 and Q2. This is because Greenplum first partitions by object then further subpartitions by time. The object partitions fit the needs of Q1 and Q2 well. In contrast, there is no object indexing support in STHadoop.
9 Discussion
Mediumgrain Partitions for Object Vertices. Our partitioning scheme for object vertices is based on hash partitioning. However, if we know more about the object vertices, we may design better partitioning schemes. Note that too fine grained partitions result in the difficulty in recording the partitions and the significant expense in computing the partitions. Therefore, we consider mediumgrain partitioning based on groups of vertices. We can add a group property to every vertex to record the group of the vertex. Then the partition decision is based on groups rather than every vertex. The group to machine assignment can be captured in the coordinator node, who also keeps track of the group statistics. The assignment of vertices to groups is application dependent. For example, it would be nice if people with similar behaviors are assigned to the same group.
Dynamic Vertex Group Partitions. Every worker node periodically reports group statistics to the coordinator machine. Based on the statistics, the coordinator machine computes the average (space and computation) loads per worker machine. It will ask worker nodes to migrate vertex groups if the load of a machine is beyond a threshold (e.g., +/ 5% of the average). The vertex groups to be migrated can be computed based on the collected statistics.
Alternative Implementation as Index on RDBMSs. We present a standalone implementation of PAST in this paper. Alternatively, we can implement PAST as a spatiotemporal index structure on top of RDBMSs. Then PAST can be applied to more applications that combine both spatiotemporal graphs and traditional relational data. PAST speeds up the part of queries that involve spatiotemporal graphs and returns a list of vertex IDs or edge IDs. Then the list can be combined with query outputs from relational queries to further compute the final query results.
More Extended Applications. The idea of PAST can be applied to slove more generalized multidimension graph problems, not limited by spatiotemporal dimension. Moreover, the concepts of space can be generalized as user space, commodity space, etc, and in the meanwhile, the distance can not only be Euclidean distance but also other similarity measurement.
10 Conclusion
In conclusion, we define a bipartite graph model for spatiotemporal graphs based on the commonalities of representative realworld applications, i.e., customer behavior tracking and mining, cloneplate car detection, and shipment tracking. We propose and evaluate PAST, a framework for efficient PArtitioning and query processing of SpatioTemporal graphs. Our experimental evaluation shows that PAST can meet the requirements of the above applications. Our proposed partitioning and storage methods and algorithm optimizations achieve significant performance improvements. For typical queries on spatiotemporal graphs, PAST can outperform stateofart systems (e.g., JanusGraph, Greenplum, Spark, STHadoop) by 1–4 orders of magnitude.
References
 [1] Y. Ahmad and S. Nath. Colrtree: Communicationefficient spatiotemporal indexing for a sensor data web portal. In ICDE, pages 784–793, Cancún, Mexico, Apr. 2008.
 [2] A. Ailamaki, D. J. DeWitt, M. D. Hill, and M. Skounakis. Weaving relations for cache performance. In VLDB 2001, Proceedings of 27th International Conference on Very Large Data Bases, September 1114, 2001, Roma, Italy, pages 169–180, 2001.
 [3] L. Alarabi and M. F. Mokbel. A demonstration of sthadoop: A mapreduce framework for big spatiotemporal data. PVLDB, 10(12):1961–1964, Nov. 2017.
 [4] R. L. B. Hendrickson. The chaco user’s guide, version 2.0. Technical Report SAND952344, Sandia National Labs, Albuquerque, NM, 1995.
 [5] S. T. Barnard. PMRSB: parallel multilevel recursive spectral bisection. In Proceedings Supercomputing ’95, page 27, San Diego, CA, USA, Dec. 1995.
 [6] Y. Cai and R. T. Ng. Indexing spatiotemporal trajectories with chebyshev polynomials. In SIGMOD, pages 599–610, Paris, France, June 2004.
 [7] Z. Cao, S. Chen, F. Li, M. Wang, and X. S. Wang. Logkv: Exploiting keyvalue stores for log processing. In CIDR, Asilomar, CA, USA, Jan. 2013.
 [8] Cassandra. https://cassandra.apache.org/.
 [9] C. Chevalier and F. Pellegrini. Ptscotch: A tool for efficient parallel graph ordering. Parallel Computing, 34:318–331, May 2008.
 [10] I. S. Dhillon. Coclustering documents and words using bipartite spectral graph partitioning. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pages 269–274, San Francisco, CA, USA, Aug. 2001.
 [11] M. Ding and S. Chen. Efficient partitioning and query processing of spatiotemporal graphs with trillion edges (to appear). In ICDE, Macau SAR, China, Apr. 2019.
 [12] Elasticsearch. https://www.elastic.co/.
 [13] A. Eldawy and M. F. Mokbel. SpatialHadoop: A mapreduce framework for spatial data. In 31st IEEE International Conference on Data Engineering, ICDE 2015, Seoul, South Korea, April 1317, 2015, pages 1352–1363, 2015.
 [14] B. Gao, T. Liu, X. Zheng, Q. Cheng, and W. Ma. Consistent bipartite graph copartitioning for starstructured highorder heterogeneous data coclustering. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 41–50, Chicago, Illinois, USA, Aug. 2005.
 [15] J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: Distributed graphparallel computation on natural graphs. In 10th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2012, pages 17–30, Hollywood, CA, USA, Oct. 2012.
 [16] Greenplum. https://greenplum.org/.
 [17] Hadoop. https://hadoop.apache.org/.
 [18] HBase. https://hbase.apache.org/.
 [19] B. Hendrickson and R. W. Leland. A multilevel algorithm for partitioning graphs. In Proceedings Supercomputing ’95, page 28, San Diego, CA, USA, Dec. 1995.
 [20] J. Huang and D. Abadi. LEOPARD: lightweight edgeoriented partitioning and replication for dynamic graphs. PVLDB, 9:540–551, Aug. 2016.
 [21] JanusGraph. http://janusgraph.org/.
 [22] G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Scientific Computing, 20:359–392, Nov. 1998.
 [23] G. Karypis and V. Kumar. A parallel algorithm for multilevel graph partitioning and sparse matrix ordering. J. Parallel Distrib. Comput., 48:71–95, May 1998.
 [24] A. Khandelwal, Z. Yang, E. Ye, R. Agarwal, and I. Stoica. Zipg: A memoryefficient graph store for interactive queries. In Proceedings of the 2017 ACM International Conference on Management of Data, pages 1149–1164, Chicago, IL, USA, May 2017.
 [25] J. Lee, J. Oh, and S. Hwang. Strgindex: Spatiotemporal region graph indexing for large video databases. In SIGMOD, pages 718–729, Baltimore, Maryland, USA, June 2005.
 [26] H. Lu, B. Yang, and C. S. Jensen. Spatiotemporal joins on symbolic indoor tracking data. In ICDE, pages 816–827, Hannover, Germany, Apr. 2011.
 [27] S. Manegold, P. A. Boncz, and M. L. Kersten. What happens during a join? dissecting CPU and memory optimization effects. In VLDB 2000, Proceedings of 26th International Conference on Very Large Data Bases, September 1014, 2000, Cairo, Egypt, pages 339–350, 2000.
 [28] Metis. http://glaros.dtc.umn.edu/gkhome/metis/metis/overview.
 [29] J. Mondal and A. Deshpande. Managing large dynamic graphs efficiently. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 145–156, Scottsdale, AZ, USA, May 2012.
 [30] Neo4j. https://neo4j.com/.
 [31] D. Papadias, Y. Tao, P. Kalnis, and J. Zhang. Indexing spatiotemporal data warehouses. In ICDE, pages 166–175, San Jose, CA, USA, Feb. 2002.
 [32] F. Pellegrini and J. Roman. SCOTCH: A software package for static mapping by dual recursive bipartitioning of process and architecture graphs. In HighPerformance Computing and Networking, International Conference and Exhibition, HPCN, pages 493–498, Brussels, Belgium, Apr. 1996.
 [33] C. Performance. https://javaperformance.info/performancegeneralcompression/.
 [34] S. Rasetic, J. Sander, J. Elding, and M. A. Nascimento. A trajectory splitting model for efficient spatiotemporal indexing. In VLDB, pages 934–945, Trondheim, Norway, Aug. 2005.
 [35] Redis. https://redis.io/.
 [36] S. Rhea, E. Wang, E. Wong, E. Atkins, and N. Storer. Littletable: A timeseries database and its uses. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, pages 125–138, Chicago, IL, USA, May 2017.
 [37] Scotch. http://www.labri.ubordeaux.fr/perso/pelegrin/scotch.
 [38] Solr. http://lucene.apache.org/solr/.
 [39] Spark. https://spark.apache.org/.
 [40] I. Stanton and G. Kliot. Streaming graph partitioning for large distributed graphs. In The 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’12, pages 1222–1230, Beijing, China, Aug. 2012.
 [41] W. Sun, A. Fokoue, K. Srinivas, A. Kementsietsidis, G. Hu, and G. T. Xie. Sqlgraph: An efficient relationalbased property graph store. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pages 1887–1901, Melbourne, Victoria, Australia, May 2015.
 [42] Timescale. https://www.timescale.com/.
 [43] Titan. https://titan.thinkaurelius.com/.
 [44] S. Verma, L. M. Leslie, Y. Shin, and I. Gupta. An experimental comparison of partitioning strategies in distributed graph processing. PVLDB, 10:493–504, Aug. 2017.
 [45] N. Xu, L. Chen, and B. Cui. Loggp: A logbased dynamic graph partitioning method. PVLDB, 7:1917–1928, 2014.
Comments
There are no comments yet.