DeepAI AI Chat
Log In Sign Up

Distributed graphs: in search of fast, low-latency, resource-efficient, semantics-rich Big-Data processing

11/26/2019
by   Miguel E. Coimbra, et al.
Inesc-ID
University of Lisbon
0

Large graphs can be processed with single high-memory or distributed systems, focusing on querying the graph or executing algorithms using high-level APIs. For systems focused on processing graphs, common use-cases consist in executing algorithms such as PageRank or community detection on top of distributed systems that read from storage (local or distributed), compute and output results to storage in a way akin to a read-eval-write loop. Graph analysis tasks face new hurdles with the additional dimension of evolving data. The systems we detail herein have considered the evolution of data in the form of stream processing. With it, semantics are offered to allow results' aggregation in windows which can be based on element counts or different time definitions. However, this semantic has yet to be incorporated in the expressiveness of graph processing itself. We firstly detail the existing types of current graph analysis tasks; secondly, highlight state-of-the-art solutions for different aspects of these tasks. The resulting analysis identifies the need for systems to be able to effectively extend the aforementioned type of read-eval loop execution, by maintaining a graph (or parts of) in a cluster's memory for reuse, skipping the recurring I/O overhead which is present in all systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

02/08/2018

System G Distributed Graph Database

Motivated by the need to extract knowledge and value from interconnected...
11/10/2022

Colocating Real-time Storage and Processing: An Analysis of Pull-based versus Push-based Streaming

Real-time Big Data architectures evolved into specialized layers for han...
11/07/2018

Data Pallets: Containerizing Storage For Reproducibility and Traceability

Trusting simulation output is crucial for Sandia's mission objectives. W...
01/20/2022

Serializable HTAP with Abort-/Wait-free Snapshot Read

Concurrency Control (CC) ensuring consistency of updated data is an esse...
03/02/2020

Graph3S: A Simple, Speedy and Scalable Distributed Graph Processing System

Graph is a ubiquitous structure in many domains. The rapidly increasing ...
01/09/2018

Search on Secondary Attributes in Geo-Distributed Systems

In the age of big data, more and more applications need to query and ana...
06/22/2016

From NoSQL Accumulo to NewSQL Graphulo: Design and Utility of Graph Algorithms inside a BigTable Database

Google BigTable's scale-out design for distributed key-value storage ins...