Transformations of High-Level Synthesis Codes for High-Performance Computing

by   Johannes de Fine Licht, et al.

Specialized hardware architectures promise a major step in performance and energy efficiency over the traditional load/store devices currently employed in large scale computing systems. The adoption of high-level synthesis (HLS) has greatly increased programmer productivity when designing for such platforms. While this has enabled a wider audience to target specialized hardware, the optimization principles known from software design are no longer sufficient to implement high-performance codes, due to fundamental differences between software and hardware architectures. In this work, we propose a set of optimizing transformations for HLS, targeting scalable and efficient architectures for high-performance computing (HPC) applications. We show how these can be used to efficiently exploit pipelining, on-chip distributed fast memory, and on-chip streaming dataflow, allowing for massively parallel architectures with little off-chip data movement. To quantify the effect of our transformations, we use them to optimize a set of high-throughput FPGA kernels, demonstrating that they are sufficient to scale up parallelism within the hardware constraints of the device. With the transformations covered, we hope to establish a common framework for performance engineers, compiler developers, and hardware developers, to tap into the performance potential offered by specialized hardware architectures using HLS.



There are no comments yet.


page 1

page 2

page 3

page 4


FBLAS: Streaming Linear Algebra on FPGA

Energy efficiency is one of the primary concerns when designing large sc...

FLOWER: A comprehensive dataflow compiler for high-level synthesis

FPGAs have found their way into data centers as accelerator cards, makin...

Understanding HPC Benchmark Performance on Intel Broadwell and Cascade Lake Processors

Hardware platforms in high performance computing are constantly getting ...

Flexible Communication Avoiding Matrix Multiplication on FPGA with High-Level Synthesis

Data movement is the dominating factor affecting performance and energy ...

Large-scale image analysis using docker sandboxing

With the advent of specialized hardware such as Graphics Processing Unit...

Improving Communication Patterns in Polyhedral Process Networks

Embedded system performances are bounded by power consumption. The trend...

High Level Synthesis with a Dataflow Architectural Template

In this work, we present a new approach to high level synthesis (HLS), w...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.