A Foray into Efficient Mapping of Algorithms to Hardware Platforms on Heterogeneous Systems

05/15/2016
by   Oren Segal, et al.
0

Heterogeneous computing can potentially offer significant performance and performance per watt improvements over homogeneous computing, but the question "what is the ideal mapping of algorithms to architectures?" remains an open one. In the past couple of years new types of computing devices such as FPGAs have come into general computing use. In this work we attempt to add to the body of scientific knowledge by comparing Kernel performance and performance per watt of seven key algorithms according to Berkley's dwarf taxonomy. We do so using the Rodinia benchmark suite on three different high-end hardware architecture representatives from the CPU, GPU and FPGA families. We find results that support some distinct mappings between the architecture and performance per watt. Perhaps the most interesting finding is that, for our specific hardware representatives, FPGAs should be considered as alternatives to GPUs and CPUs in several key algorithms: N-body simulations, dense linear algebra and structured grid.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/12/2023

HEP-BNN: A Framework for Finding Low-Latency Execution Configurations of BNNs on Heterogeneous Multiprocessor Platforms

Binarized Neural Networks (BNNs) significantly reduce the computation an...
research
08/19/2020

Evaluating the Performance of NVIDIA's A100 Ampere GPU for Sparse Linear Algebra Computations

GPU accelerators have become an important backbone for scientific high p...
research
10/28/2020

StencilFlow: Mapping Large Stencil Programs to Distributed Spatial Computing Systems

Spatial computing devices have been shown to significantly accelerate st...
research
01/04/2022

TAMM: Tensor Algebra for Many-body Methods

Tensor contraction operations in computational chemistry consume signifi...
research
11/01/2022

Apple Silicon Performance in Scientific Computing

With the release of the Apple Silicon System-on-a-Chip processors, and t...
research
12/21/2018

Towards Automatic Transformation of Legacy Scientific Code into OpenCL for Optimal Performance on FPGAs

There is a large body of legacy scientific code written in languages lik...
research
09/08/2020

On Architecture to Architecture Mapping for Concurrency

Mapping programs from one architecture to another plays a key role in te...

Please sign up or login with your details

Forgot password? Click here to reset