Flexible Support for Fast Parallel Commutative Updates

by   Vignesh Balaji, et al.

Privatizing data is a useful strategy for increasing parallelism in a shared memory multithreaded program. Independent cores can compute independently on duplicates of shared data, combining their results at the end of their computations. Conventional approaches to privatization, however, rely on explicit static or dynamic memory allocation for duplicated state, increasing memory footprint and contention for cache resources, especially in shared caches. In this work, we describe CCache, a system for on-demand privatization of data manipulated by commutative operations. CCache garners the benefits of privatization, without the increase in memory footprint or cache occupancy. Each core in CCache dynamically privatizes commutatively manipulated data, operating on a copy. Periodically or at the end of its computation, the core merges its value with the value resident in memory, and when all cores have merged, the in-memory copy contains the up-to-date value. We describe a low-complexity architectural implementation of CCache that extends a conventional multicore to support on-demand privatization without using additional memory for private copies. We evaluate CCache on several high-value applications, including random access key-value store, clustering, breadth first search and graph ranking, showing speedups upto 3.2X.


page 9

page 10


Reducing shared memory footprint to leverage high throughput on Tensor Cores and its flexible API extension library

NVIDIA Tensor Core is a mixed-precision matrix-matrix multiplication and...

Timing Cache Accesses to Eliminate Side Channels in Shared Software

Timing side channels have been used to extract cryptographic keys and se...

Cache Where you Want! Reconciling Predictability and Coherent Caching

Real-time and cyber-physical systems need to interact with and respond t...

Generalized Key-Value Memory to Flexibly Adjust Redundancy in Memory-Augmented Networks

Memory-augmented neural networks enhance a neural network with an extern...

Volcano: Stateless Cache Side-channel Attack by Exploiting Mesh Interconnect

Cache side-channel attacks lead to severe security threats to the settin...

HDTCat: let's make HDT scale

HDT (Header, Dictionary, Triples) is a serialization for RDF. HDT has be...

Studies on the energy and deep memory behaviour of a cache-oblivious, task-based hyperbolic PDE solver

We study the performance behaviour of a seismic simulation using the Exa...

Please sign up or login with your details

Forgot password? Click here to reset