Performance Portable Back-projection Algorithms on CPUs: Agnostic Data Locality and Vectorization Optimizations

04/27/2021
by   Peng Chen, et al.
0

Computed Tomography (CT) is a key 3D imaging technology that fundamentally relies on the compute-intense back-projection operation to generate 3D volumes. GPUs are typically used for back-projection in production CT devices. However, with the rise of power-constrained micro-CT devices, and also the emergence of CPUs comparable in performance to GPUs, back-projection for CPUs could become favorable. Unlike GPUs, extracting parallelism for back-projection algorithms on CPUs is complex given that parallelism and locality are not explicitly defined and controlled by the programmer, as is the case when using CUDA for instance. We propose a collection of novel back-projection algorithms that reduce the arithmetic computation, robustly enable vectorization, enforce a regular memory access pattern, and maximize the data locality. We also implement the novel algorithms as efficient back-projection kernels that are performance portable over a wide range of CPUs. Performance evaluation using a variety of CPUs from different vendors and generations demonstrates that our back-projection implementation achieves on average 5.2x speedup over the multi-threaded implementation of the most widely used, and optimized, open library. With a state-of-the-art CPU, we reach performance that rivals top-performing GPUs.

READ FULL TEXT

page 8

page 9

research
09/06/2019

iFDK: A Scalable Framework for Instant High-resolution Image Reconstruction

Computed Tomography (CT) is a widely used technology that requires compu...
research
02/11/2018

Locality Optimized Unstructured Mesh Algorithms on GPUs

Unstructured-mesh based numerical algorithms such as finite volume and f...
research
02/11/2018

Improving Locality of Unstructured Mesh Algorithms on GPUs

To most efficiently utilize modern parallel architectures, the memory ac...
research
05/12/2023

Revisiting Temporal Blocking Stencil Optimizations

Iterative stencils are used widely across the spectrum of High Performan...
research
05/08/2019

Arbitrarily large iterative tomographic reconstruction on multiple GPUs using the TIGRE toolbox

Tomographic image sizes keep increasing over time and while the GPUs tha...
research
09/15/2020

Petascale XCT: 3D Image Reconstruction with Hierarchical Communications on Multi-GPU Nodes

X-ray computed tomography is a commonly used technique for noninvasive i...
research
03/19/2018

Towards Memory Prefetching with Neural Networks: Challenges and Insights

Accurate memory prefetching is paramount for processor performance, and ...

Please sign up or login with your details

Forgot password? Click here to reset