Distributed graphs: in search of fast, low-latency, resource-efficient, semantics-rich Big-Data processing

11/26/2019
by   Miguel E. Coimbra, et al.
0

Large graphs can be processed with single high-memory or distributed systems, focusing on querying the graph or executing algorithms using high-level APIs. For systems focused on processing graphs, common use-cases consist in executing algorithms such as PageRank or community detection on top of distributed systems that read from storage (local or distributed), compute and output results to storage in a way akin to a read-eval-write loop. Graph analysis tasks face new hurdles with the additional dimension of evolving data. The systems we detail herein have considered the evolution of data in the form of stream processing. With it, semantics are offered to allow results' aggregation in windows which can be based on element counts or different time definitions. However, this semantic has yet to be incorporated in the expressiveness of graph processing itself. We firstly detail the existing types of current graph analysis tasks; secondly, highlight state-of-the-art solutions for different aspects of these tasks. The resulting analysis identifies the need for systems to be able to effectively extend the aforementioned type of read-eval loop execution, by maintaining a graph (or parts of) in a cluster's memory for reuse, skipping the recurring I/O overhead which is present in all systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/08/2018

System G Distributed Graph Database

Motivated by the need to extract knowledge and value from interconnected...
research
06/06/2023

Selecting Efficient Cluster Resources for Data Analytics: When and How to Allocate for In-Memory Processing?

Distributed dataflow systems such as Apache Spark or Apache Flink enable...
research
11/07/2018

Data Pallets: Containerizing Storage For Reproducibility and Traceability

Trusting simulation output is crucial for Sandia's mission objectives. W...
research
01/20/2022

Serializable HTAP with Abort-/Wait-free Snapshot Read

Concurrency Control (CC) ensuring consistency of updated data is an esse...
research
06/05/2023

Better Write Amplification for Streaming Data Processing

Many current applications have to perform data processing in a streaming...
research
01/09/2018

Search on Secondary Attributes in Geo-Distributed Systems

In the age of big data, more and more applications need to query and ana...
research
04/06/2023

Data Processing with FPGAs on Modern Architectures

Trends in hardware, the prevalence of the cloud, and the rise of highly ...

Please sign up or login with your details

Forgot password? Click here to reset