High Throughput Multidimensional Tridiagonal Systems Solvers on FPGAs

We present a design space exploration for synthesizing optimized, high-throughput implementations of multiple multi-dimensional tridiagonal system solvers on FPGAs. Re-evaluating the characteristics of algorithms for the direct solution of tridiagonal systems, we develop a new tridiagonal solver library aimed at implementing high-performance computing applications on Xilinx FPGA hardware. Key new features of the library are (1) the unification of standard state-of-the-art techniques for implementing implicit numerical solvers with a number of novel high-gain optimizations such as vectorization and batching, motivated by multi-dimensional systems in real-world applications, (2) data-flow techniques that provide application specific optimizations for both 2D and 3D problems, including integration of explicit loops commonplace in real workloads, and (3) the development of an analytic model to explore the design space, and obtain rapid performance estimates. The new library provide an order of magnitude better performance for solving large batches of systems compared to Xilinx's current tridiagonal solver library. Two representative applications are implemented using the new solver on a Xilinx Alveo U280 FPGA, demonstrating over 85 compared with a current state-of-the-art GPU library for solving multi-dimensional tridiagonal systems on an Nvidia V100 GPU, analyzing time to solution, bandwidth, and energy consumption. Results show the FPGAs achieving competitive or better runtime performance for a range of multi-dimensional problems compared to the V100 GPU. Additionally, the significant energy savings offered by FPGA implementations, over 30 quantified. We discuss the algorithmic trade-offs required to obtain good performance on FPGAs, giving insights into the feasibility and profitability of FPGA implementations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/04/2021

High-Level FPGA Accelerator Design for Structured-Mesh-Based Explicit Numerical Solvers

This paper presents a workflow for synthesizing near-optimal FPGA implem...
research
10/22/2021

Experience with PCIe streaming on FPGA for high throughput ML inferencing

Achieving maximum possible rate of inferencing with minimum hardware res...
research
04/22/2016

Opt: A Domain Specific Language for Non-linear Least Squares Optimization in Graphics and Imaging

Many graphics and vision problems can be expressed as non-linear least s...
research
02/11/2023

Porting numerical integration codes from CUDA to oneAPI: a case study

We present our experience in porting optimized CUDA implementations to o...
research
06/28/2021

HALF: Holistic Auto Machine Learning for FPGAs

Deep Neural Networks (DNNs) are capable of solving complex problems in d...
research
06/19/2018

A model-driven approach for a new generation of adaptive libraries

Efficient high-performance libraries often expose multiple tunable param...
research
11/25/2020

Enabling GPU Accelerated Computing in the SUNDIALS Time Integration Library

As part of the Exascale Computing Project (ECP), a recent focus of devel...

Please sign up or login with your details

Forgot password? Click here to reset