Fast Processing of Large Graph Applications Using Asynchronous Architecture

06/29/2017
by   Michel A. Kinsy, et al.
0

Graph algorithms and techniques are increasingly being used in scientific and commercial applications to express relations and explore large data sets. Although conventional or commodity computer architectures, like CPU or GPU, can compute fairly well dense graph algorithms, they are often inadequate in processing large sparse graph applications. Memory access patterns, memory bandwidth requirements and on-chip network communications in these applications do not fit in the conventional program execution flow. In this work, we propose and design a new architecture for fast processing of large graph applications. To leverage the lack of the spatial and temporal localities in these applications and to support scalable computational models, we design the architecture around two key concepts. (1) The architecture is a multicore processor of independently clocked processing elements. These elements communicate in a self-timed manner and use handshaking to perform synchronization, communication, and sequencing of operations. By being asynchronous, the operating speed at each processing element is determined by actual local latencies rather than global worst-case latencies. We create a specialized ISA to support these operations. (2) The application compilation and mapping process uses a graph clustering algorithm to optimize parallel computing of graph operations and load balancing. Through the clustering process, we make scalability an inherent property of the architecture where task-to-element mapping can be done at the graph node level or at node cluster level. A prototyped version of the architecture outperforms a comparable CPU by 10 20x across all benchmarks and provides 2 5x better power efficiency when compared to a GPU.

READ FULL TEXT
research
07/22/2016

Novel Graph Processor Architecture, Prototype System, and Results

Graph algorithms are increasingly used in applications that exploit larg...
research
12/01/2021

Triangle Counting Accelerations: From Algorithm to In-Memory Computing Architecture

Triangles are the basic substructure of networks and triangle counting (...
research
06/07/2020

Stochastic Automata Network for Performance Evaluation of Heterogeneous SoC Communication

To meet ever increasing demand for performance of emerging System-on-Chi...
research
04/15/2021

pLUTo: In-DRAM Lookup Tables to Enable Massively Parallel General-Purpose Computation

Data movement between main memory and the processor is a significant con...
research
11/01/2017

Dynamic Load Balancing Strategies for Graph Applications on GPUs

Acceleration of graph applications on GPUs has found large interest due ...
research
03/06/2020

Bundle Adjustment on a Graph Processor

Graph processors such as Graphcore's Intelligence Processing Unit (IPU) ...
research
04/06/2019

Ring-Mesh: A Scalable and High-Performance Approach for Manycore Accelerators

There is an increasing number of works addressing the design challenge o...

Please sign up or login with your details

Forgot password? Click here to reset