Gravitational octree code performance evaluation on Volta GPU

11/07/2018
by   Yohei Miki, et al.
0

In this study, the gravitational octree code originally optimized for the Fermi, Kepler, and Maxwell GPU architectures is adapted to the Volta architecture. The Volta architecture introduces independent thread scheduling requiring either the insertion of the explicit synchronizations at appropriate locations or the enforcement of the same implicit synchronizations as do the Pascal or earlier architectures by specifying -gencode arch=compute_60,code=sm_70. The performance measurements on Tesla V100, the current flagship GPU by NVIDIA, revealed that the N-body simulations of the Andromeda galaxy model with 2^23 = 8388608 particles took 3.8 × 10^-2 s or 3.3 × 10^-2 s per step for each case. Tesla V100 achieves a 1.4 to 2.2-fold acceleration in comparison with Tesla P100, the flagship GPU in the previous generation. The observed speed-up of 2.2 is greater than 1.5, which is the ratio of the theoretical peak performance of the two GPUs. The independence of the units for integer operations from those for floating-point number operations enables the overlapped execution of integer and floating-point number operations. It hides the execution time of the integer operations leading to the speed-up rate above the theoretical peak performance ratio. Tesla V100 can execute N-body simulation with up to 25 × 2^20 = 26214400 particles, and it took 2.0 × 10^-1 s per step. It corresponds to 3.5 TFlop/s, which is 22% of the single-precision theoretical peak performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/09/2022

FLInt: Exploiting Floating Point Enabled Integer Arithmetic for Efficient Random Forest Inference

In many machine learning applications, e.g., tree-based ensembles, float...
research
01/19/2022

A Mixed Precision, Multi-GPU Design for Large-scale Top-K Sparse Eigenproblems

Graph analytics techniques based on spectral methods process extremely l...
research
10/08/2018

Light-Weight RefineNet for Real-Time Semantic Segmentation

We consider an important task of effective and efficient semantic image ...
research
07/03/2020

FPnew: An Open-Source Multi-Format Floating-Point Unit Architecture for Energy-Proportional Transprecision Computing

The slowdown of Moore's law and the power wall necessitates a shift towa...
research
03/31/2023

DARKSIDE: A Heterogeneous RISC-V Compute Cluster for Extreme-Edge On-Chip DNN Inference and Training

On-chip DNN inference and training at the Extreme-Edge (TinyML) impose s...
research
10/07/2021

Ensemble Neural Representation Networks

Implicit Neural Representation (INR) has recently attracted considerable...
research
08/07/2022

Projective Geometry, Duality and Plucker Coordinates for Geometric Computations with Determinants on GPUs

Many algorithms used are based on geometrical computation. There are sev...

Please sign up or login with your details

Forgot password? Click here to reset