JArena: Partitioned Shared Memory for NUMA-awareness in Multi-threaded Scientific Applications

02/20/2019
by   Zhang Yang, et al.
0

The distributed shared memory (DSM) architecture is widely used in today's computer design to mitigate the ever-widening processing-memory gap, and inevitably exhibits non-uniform memory access (NUMA) to shared-memory parallel applications. Failure to achieve full NUMA-awareness can significantly downgrade application performance, especially on today's manycore platforms with tens to hundreds of cores. Yet traditional approaches such as first-touch and memory policy fail short in either false page-sharing, fragmentation, or ease-of-use. In this paper, we propose a partitioned shared memory approach which allows multi-threaded applications to achieve full NUMA-awareness with only minor code changes and develop a companying NUMA-aware heap manager which eliminates false page-sharing and minimizes fragmentation. Experiments on a 256-core cc-NUMA computing node show that the proposed approach achieves true NUMA-awareness and improves the performance of typical multi-threaded scientific applications up to 4.3 folds with the increased use of cores.

READ FULL TEXT
research
03/06/2020

Bandwidth-Aware Page Placement in NUMA

Page placement is a critical problem for memoryintensive applications ru...
research
02/19/2023

Rethinking Memory Profiling and Migration for Multi-Tiered Large Memory Systems

Multi-tiered large memory systems call for rethinking of memory profilin...
research
08/24/2023

Experience with Distributed Memory Delaunay-based Image-to-Mesh Conversion Implementation

This paper presents some of our findings on the scalability of parallel ...
research
05/09/2011

User Mode Memory Page Management: An old idea applied anew to the memory wall problem

It is often said that one of the biggest limitations on computer perform...
research
12/05/2020

MemPool: A Shared-L1 Memory Many-Core Cluster with a Low-Latency Interconnect

A key challenge in scaling shared-L1 multi-core clusters towards many-co...
research
03/30/2023

MemPool: A Scalable Manycore Architecture with a Low-Latency Shared L1 Memory

Shared L1 memory clusters are a common architectural pattern (e.g., in G...
research
07/27/2023

Automatic Parallelization of Software Network Functions

Software network functions (NFs) trade-off flexibility and ease of deplo...

Please sign up or login with your details

Forgot password? Click here to reset