Exploring Thread Coarsening on FPGA

08/25/2022
by   Mostafa Eghbali Zarch, et al.
0

Over the past few years, there has been an increased interest in including FPGAs in data centers and high-performance computing clusters along with GPUs and other accelerators. As a result, it has become increasingly important to have a unified, high-level programming interface for CPUs, GPUs and FPGAs. This has led to the development of compiler toolchains to deploy OpenCL code on FPGA. However, the fundamental architectural differences between GPUs and FPGAs have led to performance portability issues: it has been shown that OpenCL code optimized for GPU does not necessarily map well to FPGA, often requiring manual optimizations to improve performance. In this paper, we explore the use of thread coarsening - a compiler technique that consolidates the work of multiple threads into a single thread - on OpenCL code running on FPGA. While this optimization has been explored on CPU and GPU, the architectural features of FPGAs and the nature of the parallelism they offer lead to different performance considerations, making an analysis of thread coarsening on FPGA worthwhile. Our evaluation, performed on our microbenchmarks and on a set of applications from open-source benchmark suites, shows that thread coarsening can yield performance benefits (up to 3-4x speedups) to OpenCL code running on FPGA at a limited resource utilization cost.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/04/2021

High-Level FPGA Accelerator Design for Structured-Mesh-Based Explicit Numerical Solvers

This paper presents a workflow for synthesizing near-optimal FPGA implem...
research
01/08/2022

A Compiler Framework for Optimizing Dynamic Parallelism on GPUs

Dynamic parallelism on GPUs allows GPU threads to dynamically launch oth...
research
05/08/2015

FPGA-Based Bandwidth Selection for Kernel Density Estimation Using High Level Synthesis Approach

FPGA technology can offer significantly higher performance at much lower...
research
02/05/2020

MKPipe: A Compiler Framework for Optimizing Multi-Kernel Workloads in OpenCL for FPGA

OpenCL for FPGA enables developers to design FPGAs using a programming m...
research
08/28/2021

Compiler-Driven FPGA Virtualization with SYNERGY

FPGAs are increasingly common in modern applications, and cloud provider...
research
12/18/2022

High-Performance Filters For GPUs

Filters approximately store a set of items while trading off accuracy fo...
research
01/04/2022

On the Influence of the FPGA Compiler Optimization Options on the Success of the Horizontal Attack

This paper reports about the impact of compiler options on the resistanc...

Please sign up or login with your details

Forgot password? Click here to reset