SoaAlloc: A Lock-free Hierarchical Bitmap-based Object Allocator for GPUs

10/28/2018
by   Matthias Springer, et al.
0

Designing dynamic memory allocators for GPUs is challenging because applications can issue allocation requests in a highly parallel fashion and memory access and the data layout must be optimized to achieve good memory bandwidth utilization. Despite recent advances in GPU computing, current memory allocators for SIMD architectures are still not suitable for structured data because they fail to incorporate well-known best practices for optimizing memory access. Therefore, we developed SoaAlloc, a new dynamic object allocator for GPUs. Besides delivering competitive raw (de)allocation performance, SoaAlloc improves the usage of allocated memory with a Structure of Arrays (SOA) data layout and achieves low memory fragmentation through efficient management of free and allocated memory blocks with lock-free, hierarchical bitmaps. The SOA layout alone results in a 2x speedup of application code over state-of-the-art allocators in our benchmarks. Furthermore, SoaAlloc is the first GPU object allocator that provides a do-all operation, which is an important recurring pattern in high-performance code where parallelism is expressed over a set of objects.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/28/2018

DynaSOAr: A Parallel Memory Allocator for Object-oriented Programming on GPUs with Efficient Memory Access

Object-oriented programming has long been regarded as too inefficient fo...
research
09/20/2018

SoaAlloc: Accelerating Single-Method Multiple-Objects Applications on GPUs

We propose SoaAlloc, a dynamic object allocator for Single-Method Multip...
research
04/17/2021

Ripple : Simplified Large-Scale Computation on Heterogeneous Architectures with Polymorphic Data Layout

GPUs are now used for a wide range of problems within HPC. However, maki...
research
06/08/2021

LLAMA: The Low-Level Abstraction For Memory Access

The performance gap between CPU and memory widens continuously. Choosing...
research
02/16/2023

Updates on the Low-Level Abstraction of Memory Access

Choosing the best memory layout for each hardware architecture is increa...
research
10/16/2020

Combinatorics and Geometry for the Many-ported, Distributed and Shared Memory Architecture

Manycore SoC architectures based on on-chip shared memory are preferred ...
research
10/09/2020

TurboTransformers: An Efficient GPU Serving System For Transformer Models

The transformer is the most critical algorithm innovation of the Nature ...

Please sign up or login with your details

Forgot password? Click here to reset