Energy efficiency of finite difference algorithms on multicore CPUs, GPUs, and Intel Xeon Phi processors

09/27/2017
by   Satya P. Jammy, et al.
0

In addition to hardware wall-time restrictions commonly seen in high-performance computing systems, it is likely that future systems will also be constrained by energy budgets. In the present work, finite difference algorithms of varying computational and memory intensity are evaluated with respect to both energy efficiency and runtime on an Intel Ivy Bridge CPU node, an Intel Xeon Phi Knights Landing processor, and an NVIDIA Tesla K40c GPU. The conventional way of storing the discretised derivatives to global arrays for solution advancement is found to be inefficient in terms of energy consumption and runtime. In contrast, a class of algorithms in which the discretised derivatives are evaluated on-the-fly or stored as thread-/process-local variables (yielding high compute intensity) is optimal both with respect to energy consumption and runtime. On all three hardware architectures considered, a speed-up of 2 and an energy saving of 2 are observed for the high compute intensive algorithms compared to the memory intensive algorithm. The energy consumption is found to be proportional to runtime, irrespective of the power consumed and the GPU has an energy saving of 5 compared to the same algorithm on a CPU node.

READ FULL TEXT
research
10/28/2016

Performance evaluation of explicit finite difference algorithms with varying amounts of computational and memory intensity

Future architectures designed to deliver exascale performance motivate t...
research
01/09/2023

Improving Energy Saving of One-sided Matrix Decompositions on CPU-GPU Heterogeneous Systems

One-sided dense matrix decompositions (e.g., Cholesky, LU, and QR) are t...
research
05/15/2021

Comparison of HPC Architectures for Computing All-Pairs Shortest Paths. Intel Xeon Phi KNL vs NVIDIA Pascal

Today, one of the main challenges for high-performance computing systems...
research
04/18/2017

Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: programming productivity, performance, and energy consumption

Many modern parallel computing systems are heterogeneous at their node l...
research
09/12/2016

An ECM-based energy-efficiency optimization approach for bandwidth-limited streaming kernels on recent Intel Xeon processors

We investigate an approach that uses low-level analysis and the executio...
research
06/19/2023

From array algebra to energy efficiency on GPUs: Data and hardware shapes with dimension-lifting to optimize memory-processor layouts

We present a new formulation for parallel matrix multiplication (MM) to ...
research
03/20/2017

Parallel Sort-Based Matching for Data Distribution Management on Shared-Memory Multiprocessors

In this paper we consider the problem of identifying intersections betwe...

Please sign up or login with your details

Forgot password? Click here to reset