Near-Memory Address Translation

by   Javier Picorel, et al.

Memory and logic integration on the same chip is becoming increasingly cost effective, creating the opportunity to offload data-intensive functionality to processing units placed inside memory chips. The introduction of memory-side processing units (MPUs) into conventional systems faces virtual memory as the first big showstopper: without efficient hardware support for address translation MPUs have highly limited applicability. Unfortunately, conventional translation mechanisms fall short of providing fast translations as contemporary memories exceed the reach of TLBs, making expensive page walks common. In this paper, we are the first to show that the historically important flexibility to map any virtual page to any page frame is unnecessary in today's servers. We find that while limiting the associativity of the virtual-to-physical mapping incurs no penalty, it can break the translate-then-fetch serialization if combined with careful data placement in the MPU's memory, allowing for translation and data fetch to proceed independently and in parallel. We propose the Distributed Inverted Page Table (DIPTA), a near-memory structure in which the smallest memory partition keeps the translation information for its data share, ensuring that the translation completes together with the data fetch. DIPTA completely eliminates the performance overhead of translation, achieving speedups of up to 3.81x and 2.13x over conventional translation using 4KB and 1GB pages respectively.


page 1

page 2

page 3

page 4


Cichlid: Explicit physical memory management for large machines

In this paper, we rethink how an OS supports virtual memory. Classical V...

Utopia: Efficient Address Translation using Hybrid Virtual-to-Physical Address Mapping

The conventional virtual-to-physical address mapping scheme enables a vi...

NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units

To satisfy the compute and memory demands of deep neural networks, neura...

Coalesced TLB to Exploit Diverse Contiguity of Memory Mapping

The miss rate of TLB is crucial to the performance of address translatio...

Hardware Translation Coherence for Virtualized Systems

To improve system performance, modern operating systems (OSes) often und...

Bandwidth-Aware Page Placement in NUMA

Page placement is a critical problem for memoryintensive applications ru...

VESPA: VIPT Enhancements for Superpage Accesses

L1 caches are critical to the performance of modern computer systems. Th...

Please sign up or login with your details

Forgot password? Click here to reset