Evaluating the Cost of Atomic Operations on Modern Architectures

10/19/2020
by   Hermann Schweizer, et al.
0

Atomic operations (atomics) such as Compare-and-Swap (CAS) or Fetch-and-Add (FAA) are ubiquitous in parallel programming. Yet, performance tradeoffs between these operations and various characteristics of such systems, such as the structure of caches, are unclear and have not been thoroughly analyzed. In this paper we establish an evaluation methodology, develop a performance model, and present a set of detailed benchmarks for latency and bandwidth of different atomics. We consider various state-of-the-art x86 architectures: Intel Haswell, Xeon Phi, Ivy Bridge, and AMD Bulldozer. The results unveil surprising performance relationships between the considered atomics and architectural properties such as the coherence state of the accessed cache lines. One key finding is that all the tested atomics have comparable latency and bandwidth even if they are characterized by different consensus numbers. Another insight is that the hardware implementation of atomics prevents any instruction-level parallelism even if there are no dependencies between the issued operations. Finally, we discuss solutions to the discovered performance issues in the analyzed architectures. Our analysis enables simpler and more effective parallel programming and accelerates data processing on various architectures deployed in both off-the-shelf machines and large compute systems.

READ FULL TEXT
research
05/14/2018

A 3D Parallel Algorithm for QR Decomposition

Interprocessor communication often dominates the runtime of large matrix...
research
02/24/2017

An analysis of core- and chip-level architectural features in four generations of Intel server processors

This paper presents a survey of architectural features among four genera...
research
01/19/2021

SynCron: Efficient Synchronization Support for Near-Data-Processing Architectures

Near-Data-Processing (NDP) architectures present a promising way to alle...
research
02/26/2023

Asynchronous Persistence with ASAP

Supporting atomic durability of updates for persistent memories is typic...
research
11/25/2020

Rapid Exploration of Optimization Strategies on Advanced Architectures using TestSNAP and LAMMPS

The exascale race is at an end with the announcement of the Aurora and F...
research
06/15/2011

A Characterization of the SPARC T3-4 System

This technical report covers a set of experiments on the 64-core SPARC T...
research
11/26/2021

Influence of atomic FAA on ParallelFor and a cost model for improvements

This paper focuses on one of the most frequently visited multithreading ...

Please sign up or login with your details

Forgot password? Click here to reset