The Influence of Malloc Placement on TSX Hardware Transactional Memory

04/17/2015
by   Dave Dice, et al.
0

The hardware transactional memory (HTM) implementation in Intel's i7-4770 "Haswell" processor tracks the transactional read-set in the L1 (level-1), L2 (level-2) and L3 (level-3) caches and the write-set in the L1 cache. Displacement or eviction of read-set entries from the cache hierarchy or write-set entries from the L1 results in abort. We show that the placement policies of dynamic storage allocators – such as those found in common "malloc" implementations – can influence the L1 conflict miss rate in the L1. Conflict misses – sometimes called mapping misses – arise because of less than ideal associativity and represent imbalanced distribution of active memory blocks over the set of available L1 indices. Under transactional execution conflict misses may manifest as aborts, representing wasted or futile effort instead of a simple stall as would occur in normal execution mode. Furthermore, when HTM is used for transactional lock elision (TLE), persistent aborts arising from conflict misses can force the offending thread through the so-called "slow path". The slow path is undesirable as the thread must acquire the lock and run the critical section in normal execution mode, precluding the concurrent execution of threads in the "fast path" that monitor that same lock and run their critical sections in transactional mode. For a given lock, multiple threads can concurrently use the transactional fast path, but at most one thread can use the non-transactional slow path at any given time. Threads in the slow path preclude safe concurrent fast path execution. Aborts rising from placement policies and L1 index imbalance can thus result in loss of concurrency and reduced aggregate throughput.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/15/2019

RF-Trojan: Leaking Kernel Data Using Register File Trojan

Register Files (RFs) are the most frequently accessed memories in a micr...
research
03/22/2010

Proficient Pair of Replacement Algorithms on L1 and L2 Cache for Merge Sort

Memory hierarchy is used to compete the processors speed. Cache memory i...
research
07/27/2016

Read-Tuned STT-RAM and eDRAM Cache Hierarchies for Throughput and Energy Enhancement

As capacity and complexity of on-chip cache memory hierarchy increases, ...
research
05/24/2022

Writes Hurt: Lessons in Cache Design for Optane NVRAM

Intel OptaneTM DC Persistent Memory resides on the memory bus and approa...
research
10/20/2021

Fast Bitmap Fit: A CPU Cache Line friendly memory allocator for single object allocations

Applications making excessive use of single-object based data structures...
research
04/03/2018

The Transactional Conflict Problem

The transactional conflict problem arises in transactional systems whene...
research
04/10/2018

A Non-blocking Buddy System for Scalable Memory Allocation on Multi-core Machines

Common implementations of core memory allocation components, like the Li...

Please sign up or login with your details

Forgot password? Click here to reset