Efficient FPGA Implementation of Conjugate Gradient Methods for Laplacian System using HLS

03/10/2018
by   Sahithi Rampalli, et al.
0

In this paper, we study FPGA based pipelined and superscalar design of two variants of conjugate gradient methods for solving Laplacian equation on a discrete grid; the first version corresponds to the original conjugate gradient algorithm, and the second version corresponds to a slightly modified version of the same. In conjugate gradient method to solve partial differential equations, matrix vector operations are required in each iteration; these operations can be implemented as 5 point stencil operations on the grid without explicitely constructing the matrix. We show that a pipelined and superscalar design using high level synthesis written in C language leads to a significant reduction in latencies for both methods. When comparing these two, we show that the later has roughly two times lower latency than the former given the same degree of superscalarity. These reductions in latencies for the newer variant of CG is due to parallel implementations of stencil operation on subdomains of the grid, and dut to overlap of these stencil operations with dot product operations. In a superscalar design, domain needs to be partitioned, and boundary data needs to be copied, which requires padding. In 1D partition, the padding latency increases as the number of partitions increase. For a streaming data flow model, we propose a novel traversal of the grid for 2D domain decomposition that leads to 2 times reduction in latency cost involved with padding compared to 1D partitions. Our implementation is roughly 10 times faster than software implementation for linear system of dimension 10000 × 10000.

READ FULL TEXT
research
09/28/2022

LL-GNN: Low Latency Graph Neural Networks on FPGAs for Particle Detectors

This work proposes a novel reconfigurable architecture for low latency G...
research
06/15/2018

FPGA acceleration of Model Predictive Control for Iter Plasma current and shape control

A faster implementation of the Quadratic Programming (QP) solver used in...
research
09/21/2019

Multithreaded Filtering Preconditioner for Diffusion Equation on Structured Grid

A parallel and nested version of a frequency filtering preconditioner is...
research
05/04/2019

New communication hiding conjugate gradient variants

The conjugate gradient algorithm suffers from communication bottlenecks ...
research
01/21/2021

Direct Spatial Implementation of Sparse Matrix Multipliers for Reservoir Computing

Reservoir computing systems rely on the recurrent multiplication of a ve...
research
10/11/2017

Grid peeling and the affine curve-shortening flow

In this paper we study an experimentally-observed connection between two...
research
01/27/2022

On the RTL Implementation of FINN Matrix Vector Compute Unit

FPGA-based accelerators are becoming more popular for deep neural networ...

Please sign up or login with your details

Forgot password? Click here to reset