MKPipe: A Compiler Framework for Optimizing Multi-Kernel Workloads in OpenCL for FPGA

02/05/2020
by   Ji Liu, et al.
0

OpenCL for FPGA enables developers to design FPGAs using a programming model similar for processors. Recent works have shown that code optimization at the OpenCL level is important to achieve high computational efficiency. However, existing works either focus primarily on optimizing single kernels or solely depend on channels to design multi-kernel pipelines. In this paper, we propose a source-to-source compiler framework, MKPipe, for optimizing multi-kernel workloads in OpenCL for FPGA. Besides channels, we propose new schemes to enable multi-kernel pipelines. Our optimizing compiler employs a systematic approach to explore the tradeoffs of these optimizations methods. To enable more efficient overlapping between kernel execution, we also propose a novel workitem/workgroup-id remapping technique. Furthermore, we propose new algorithms for throughput balancing and resource balancing to tune the optimizations upon individual kernels in the multi-kernel workloads. Our results show that our compiler-optimized multi-kernels achieve up to 3.6x (1.4x on average) speedup over the baseline, in which the kernels have already been optimized individually.

READ FULL TEXT
research
10/08/2021

DPUV3INT8: A Compiler View to programmable FPGA Inference Engines

We have a FPGA design, we make it fast, efficient, and tested for a few ...
research
07/25/2018

Compiling Database Application Programs

There is a trend towards increased specialization of data management sof...
research
08/25/2022

Exploring Thread Coarsening on FPGA

Over the past few years, there has been an increased interest in includi...
research
01/10/2022

Studying the Potential of Automatic Optimizations in the Intel FPGA SDK for OpenCL

High Level Synthesis (HLS) tools, like the Intel FPGA SDK for OpenCL, im...
research
08/28/2021

Compiler-Driven FPGA Virtualization with SYNERGY

FPGAs are increasingly common in modern applications, and cloud provider...
research
05/26/2018

Time-Shared Execution of Realtime Computer Vision Pipelines by Dynamic Partial Reconfiguration

This paper presents an FPGA runtime framework that demonstrates the feas...
research
01/19/2021

Porcupine: A Synthesizing Compiler for Vectorized Homomorphic Encryption

Homomorphic encryption (HE) is a privacy-preserving technique that enabl...

Please sign up or login with your details

Forgot password? Click here to reset