AI Chat AI Image Generator AI Video Text to Speech

Influence of atomic FAA on ParallelFor and a cost model for improvements

11/26/2021

∙

by Ran Shuai, et al.

∙

∙

This paper focuses on one of the most frequently visited multithreading library interfaces - ParallelFor. In this study, it is inferred that ParallelFor's end-to-end latency performance is noticeably affected by the frequency with which fetch-add-add (FAA) is called during program execution. This can be explained by ParallelFor's uniform semantics and the utilization of atomic FAA. To prove this assumption, a battery of tests was designed and conducted on diverse platforms. From the collected performance statistics and overall trends, several conclusions were drawn and a cost model is proposed to enhance performance by mitigating the influence of FAA.

page 1

page 3

page 4

page 5

page 6

research

∙ 10/02/2021

Spindle: Techniques for Optimizing Atomic Multicast on RDMA

Leveraging one-sided RDMA for applications that replicate small data obj...

0 Sagar Jha, et al. ∙

research

∙ 01/27/2022

The MSXF TTS System for ICASSP 2022 ADD Challenge

This paper presents our MSXF TTS system for Task 3.1 of the Audio Deep S...

0 Chunyong Yang, et al. ∙

research

∙ 08/08/2019

Privatization-Safe Transactional Memories (Extended Version)

Transactional memory (TM) facilitates the development of concurrent appl...

0 Artem Khyzha, et al. ∙

research

∙ 03/04/2016

Performance Localisation

Performance becomes an issue particularly when execution cost hinders th...

0 Brendan Cody-Kenny, et al. ∙

research

∙ 02/26/2023

Asynchronous Persistence with ASAP

Supporting atomic durability of updates for persistent memories is typic...

0 Ahmed Abulila, et al. ∙

research

∙ 10/19/2020

Evaluating the Cost of Atomic Operations on Modern Architectures

Atomic operations (atomics) such as Compare-and-Swap (CAS) or Fetch-and-...

0 Hermann Schweizer, et al. ∙