PIM-tree: A Skew-resistant Index for Processing-in-Memory

11/18/2022
by   Hongbo Kang, et al.
0

The performance of today's in-memory indexes is bottlenecked by the memory latency/bandwidth wall. Processing-in-memory (PIM) is an emerging approach that potentially mitigates this bottleneck, by enabling low-latency memory access whose aggregate memory bandwidth scales with the number of PIM nodes. There is an inherent tension, however, between minimizing inter-node communication and achieving load balance in PIM systems, in the presence of workload skew. This paper presents PIM-tree, an ordered index for PIM systems that achieves both low communication and high load balance, regardless of the degree of skew in the data and the queries. Our skew-resistant index is based on a novel division of labor between the multi-core host CPU and the PIM nodes, which leverages the strengths of each. We introduce push-pull search, which dynamically decides whether to push queries to a PIM-tree node (CPU -> PIM-node) or pull the node's keys back to the CPU (PIM-node -> CPU) based on workload skew. Combined with other PIM-friendly optimizations (shadow subtrees and chunked skip lists), our PIM-tree provides high-throughput, (guaranteed) low communication, and (guaranteed) high load balance, for batches of point queries, updates, and range scans. We implement the PIM-tree structure, in addition to prior proposed PIM indexes, on the latest PIM system from UPMEM, with 32 CPU cores and 2048 PIM nodes. On workloads with 500 million keys and batches of one million queries, the throughput using PIM-trees is up to 69.7x and 59.1x higher than the two best prior methods. As far as we know these are the first implementations of an ordered index on a real PIM system.

READ FULL TEXT
research
05/06/2018

Wormhole: A Fast Ordered Index for In-memory Data Management

In-memory data management systems, such as key-value store, have become ...
research
04/20/2023

Optimizing High-Performance Linpack for Exascale Accelerated Architectures

We detail the performance optimizations made in rocHPL, AMD's open-sourc...
research
06/03/2018

Efficient Time-Evolving Stream Processing at Scale

Time-evolving stream datasets exist ubiquitously in many real-world appl...
research
11/03/2017

Elasticutor: Rapid Elasticity for Realtime Stateful Stream Processing

Elasticity is highly desirable for stream processing systems to guarante...
research
06/20/2017

Index Search Algorithms for Databases and Modern CPUs

Over the years, many different indexing techniques and search algorithms...
research
07/01/2022

The "AI+R"-tree: An Instance-optimized R-tree

The emerging class of instance-optimized systems has shown potential to ...
research
11/30/2020

Accelerating Bandwidth-Bound Deep Learning Inference with Main-Memory Accelerators

DL inference queries play an important role in diverse internet services...

Please sign up or login with your details

Forgot password? Click here to reset