Automated Space/Time Scaling of Streaming Task Graph

06/12/2016
by   Hossein Omidian, et al.
0

In this paper, we describe a high-level synthesis (HLS) tool that automatically allows area/throughput trade-offs for implementing streaming task graphs (STG). Our tool targets a massively parallel processor array (MPPA) architecture, very similar to the Ambric MPPA chip architecture, which is to be implemented as an FPGA overlay. Similar to Ambric tools, our HLS tool accepts a STG as input written in a subset of Java and a structural language in the style of a Kahn Processing Network (KPN). Unlike the Ambric tools, our HLS tool analyzes the parallelism internal to each Java "node" and evaluates the throughput and area of several possible implementations. It then analyzes the full graph for bottlenecks or excess compute capacity, selects an implementation for each node, and even considers replicating or splitting nodes while either minimizing area (for a fixed throughput target), or maximizing throughput (for a fixed area target). In addition to traditional node selection and replication methods used in prior work, we have uniquely implemented node combining and splitting to find a better area/throughput trade-off. We present two optimization approaches, a formal ILP formulation and a heuristic solution. Results show that the heuristic is more flexible and can find design points not available to the ILP, thereby achieving superior results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/29/2019

Pyramid: Machine Learning Framework to Estimate the Optimal Timing and Resource Usage of a High-Level Synthesis Design

The emergence of High-Level Synthesis (HLS) tools shifted the paradigm o...
research
04/13/2021

MELOPPR: Software/Hardware Co-design for Memory-efficient Low-latency Personalized PageRank

Personalized PageRank (PPR) is a graph algorithm that evaluates the impo...
research
12/10/2018

Application-Specific System Processor for the SHA-1 Hash Algorithm

This work proposes an Application-Specific System Processor (ASSP) hardw...
research
08/25/2020

The optimal network throughputs when the model-aware node coexists with other nodes using different MAC protocols

In this document, we give the optimal network throughput when the DR-DLM...
research
07/18/2019

FBLAS: Streaming Linear Algebra on FPGA

Energy efficiency is one of the primary concerns when designing large sc...
research
04/01/2020

Streaming Temporal Graphs: Subgraph Matching

We investigate solutions to subgraph matching within a temporal stream o...
research
01/18/2022

Hardware-Efficient Deconvolution-Based GAN for Edge Computing

Generative Adversarial Networks (GAN) are cutting-edge algorithms for ge...

Please sign up or login with your details

Forgot password? Click here to reset