Influence of atomic FAA on ParallelFor and a cost model for improvements

11/26/2021
by   Ran Shuai, et al.
0

This paper focuses on one of the most frequently visited multithreading library interfaces - ParallelFor. In this study, it is inferred that ParallelFor's end-to-end latency performance is noticeably affected by the frequency with which fetch-add-add (FAA) is called during program execution. This can be explained by ParallelFor's uniform semantics and the utilization of atomic FAA. To prove this assumption, a battery of tests was designed and conducted on diverse platforms. From the collected performance statistics and overall trends, several conclusions were drawn and a cost model is proposed to enhance performance by mitigating the influence of FAA.

READ FULL TEXT

page 1

page 3

page 4

page 5

page 6

research
10/02/2021

Spindle: Techniques for Optimizing Atomic Multicast on RDMA

Leveraging one-sided RDMA for applications that replicate small data obj...
research
01/27/2022

The MSXF TTS System for ICASSP 2022 ADD Challenge

This paper presents our MSXF TTS system for Task 3.1 of the Audio Deep S...
research
08/08/2019

Privatization-Safe Transactional Memories (Extended Version)

Transactional memory (TM) facilitates the development of concurrent appl...
research
03/04/2016

Performance Localisation

Performance becomes an issue particularly when execution cost hinders th...
research
02/26/2023

Asynchronous Persistence with ASAP

Supporting atomic durability of updates for persistent memories is typic...
research
10/19/2020

Evaluating the Cost of Atomic Operations on Modern Architectures

Atomic operations (atomics) such as Compare-and-Swap (CAS) or Fetch-and-...

Please sign up or login with your details

Forgot password? Click here to reset