Transparent Compiler and Runtime Specializations for Accelerating Managed Languages on FPGAs

10/30/2020
by   Michail Papadimitriou, et al.
0

In recent years, heterogeneous computing has emerged as the vital way to increase computers? performance and energy efficiency by combining diverse hardware devices, such as Graphics Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs). The rationale behind this trend is that different parts of an application can be offloaded from the main CPU to diverse devices, which can efficiently execute these parts as co-processors. FPGAs are a subset of the most widely used co-processors, typically used for accelerating specific workloads due to their flexible hardware and energy-efficient characteristics. These characteristics have made them prevalent in a broad spectrum of computing systems ranging from low-power embedded systems to high-end data centers and cloud infrastructures. However, these hardware characteristics come at the cost of programmability. Developers who create their applications using high-level programming languages (e.g., Java, Python, etc.) are required to familiarize with a hardware description language (e.g., VHDL, Verilog) or recently heterogeneous programming models (e.g., OpenCL, HLS) in order to exploit the co-processors? capacity and tune the performance of their applications. Currently, the above-mentioned heterogeneous programming models support exclusively the compilation from compiled languages, such as C and C++. Thus, the transparent integration of heterogeneous co-processors to the software ecosystem of managed programming languages (e.g. Java, Python) is not seamless. In this paper we rethink the engineering trade-offs that we encountered, in terms of transparency and compilation overheads, while integrating FPGAs into high-level managed programming languages. We present a novel approach that enables runtime code specialization techniques for seamless and high-performance execution of Java programs on FPGAs. The proposed solution is prototyped in the context of the Java programming language and TornadoVM; an open-source programming framework for Java execution on heterogeneous hardware. Finally, we evaluate the proposed solution for FPGA execution against both sequential and multi-threaded Java implementations showcasing up to 224x and 19.8x performance speedups, respectively, and up to 13.82x compared to TornadoVM running on an Intel integrated GPU. We also provide a break-down analysis of the proposed compiler optimizations for FPGA execution, as a means to project their impact on the applications? characteristics.

READ FULL TEXT

page 6

page 7

page 15

page 28

research
12/15/2021

Exploring Aspects of Polyglot High-Performance Virtual Machine GraalVM

Contemporary software often becomes vastly complex, and we are required ...
research
01/13/2020

Towards High Performance Java-based Deep Learning Frameworks

The advent of modern cloud services along with the huge volume of data p...
research
12/17/2020

DAG-based Scheduling with Resource Sharing for Multi-task Applications in a Polyglot GPU Runtime

GPUs are readily available in cloud computing and personal devices, but ...
research
06/16/2022

CuPBoP: CUDA for Parallelized and Broad-range Processors

CUDA is one of the most popular choices for GPU programming, but it can ...
research
12/21/2018

Towards Automatic Transformation of Legacy Scientific Code into OpenCL for Optimal Performance on FPGAs

There is a large body of legacy scientific code written in languages lik...
research
04/07/2022

Towards Comparing Performance of Algorithms in Hardware and Software

In this paper, we report on a preliminary investigation of the potential...
research
11/22/2018

A solution to the problem of parallel programming

The problem of parallel programming is the most important open problem o...

Please sign up or login with your details

Forgot password? Click here to reset