FBLAS: Streaming Linear Algebra on FPGA

07/18/2019
by   Tiziano De Matteis, et al.
0

Energy efficiency is one of the primary concerns when designing large scale computing systems. This makes reconfigurable hardware an attractive alternative to load-store architectures, as it allows eliminating expensive control and data movement overheads in computations. In practice, these devices are often not considered in the high-performance computing community, due to the steep learning curve and low productivity of hardware design, and the lack of available library support for fundamental operations. With the introduction of high-level synthesis (HLS) tools, programming hardware has become more accessible, but optimizing for these architectures requires factoring in new transformations and trade-offs between hardware resources and computational performance. We present FBLAS, an open source implementation of BLAS for FPGAs. FBLAS is implemented with HLS, enabling reusability, maintainability, and portability across FPGAs, and easy integration with existing software and hardware codes. By using the work-depth model, we capture the space/time trade-off of designing linear algebra circuits, allowing modules to be optimized within performance or resource constraints. Module interfaces are designed to natively support streaming communication across on-chip connections, allowing them to be composed to reduce off-chip communication. With the methodologies used to design FBLAS, we hope to set a precedent for FPGA library design, and contribute to the toolbox of customizable hardware components that is necessary for HPC codes to start productively targeting reconfigurable platforms.

READ FULL TEXT
research
05/21/2018

Transformations of High-Level Synthesis Codes for High-Performance Computing

Specialized hardware architectures promise a major step in performance a...
research
12/13/2019

Flexible Communication Avoiding Matrix Multiplication on FPGA with High-Level Synthesis

Data movement is the dominating factor affecting performance and energy ...
research
09/07/2019

Streaming Message Interface: High-Performance Distributed Memory Programming on Reconfigurable Hardware

Distributed memory programming is the established paradigm used in high-...
research
10/10/2019

hlslib: Software Engineering for Hardware Design

High-level synthesis (HLS) tools have brought FPGA development into the ...
research
04/26/2021

TensorLib: A Spatial Accelerator Generation Framework for Tensor Algebra

Tensor algebra finds applications in various domains, and these applicat...
research
01/23/2017

Design of an Audio Interface for Patmos

This paper describes the design and implementation of an audio interface...
research
06/12/2016

Automated Space/Time Scaling of Streaming Task Graph

In this paper, we describe a high-level synthesis (HLS) tool that automa...

Please sign up or login with your details

Forgot password? Click here to reset