SPARTA: A Divide and Conquer Approach to Address Translation for Accelerators

by   Javier Picorel, et al.

Virtual memory (VM) is critical to the usability and programmability of hardware accelerators. Unfortunately, implementing accelerator VM efficiently is challenging because the area and power constraints make it difficult to employ the large multi-level TLBs used in general-purpose CPUs. Recent research proposals advocate a number of restrictions on virtual-to-physical address mappings in order to reduce the TLB size or increase its reach. However, such restrictions are unattractive because they forgo many of the original benefits of traditional VM, such as demand paging and copy-on-write. We propose SPARTA, a divide and conquer approach to address translation. SPARTA splits the address translation into accelerator-side and memory-side parts. The accelerator-side translation hardware consists of a tiny TLB covering only the accelerator's cache hierarchy (if any), while the translation for main memory accesses is performed by shared memory-side TLBs. Performing the translation for memory accesses on the memory side allows SPARTA to overlap data fetch with translation, and avoids the replication of TLB entries for data shared among accelerators. To further improve the performance and efficiency of the memory-side translation, SPARTA logically partitions the memory space, delegating translation to small and efficient per-partition translation hardware. Our evaluation on index-traversal accelerators shows that SPARTA virtually eliminates translation overhead, reducing it by over 30x on average (up to 47x) and improving performance by 57 minimal accelerator-side translation hardware, reduces the total number of TLB entries in the system, gracefully scales with memory size, and preserves all key VM functionalities.


Accelerator-level Parallelism

Future applications demand more performance, but technology advances hav...

Scalable and Efficient Virtual Memory Sharing in Heterogeneous SoCs with TLB Prefetching and MMU-Aware DMA Engine

Shared virtual memory (SVM) is key in heterogeneous systems on chip (SoC...

Address Translation Design Tradeoffs for Heterogeneous Systems

This paper presents a broad, pathfinding design space exploration of mem...

A Least-Privilege Memory Protection Model for Modern Hardware

We present a new least-privilege-based model of addressing on which to b...

Coalesced TLB to Exploit Diverse Contiguity of Memory Mapping

The miss rate of TLB is crucial to the performance of address translatio...

Garbage Collection Techniques for Flash-Resident Page-Mapping FTLs

Storage devices based on flash memory have replaced hard disk drives (HD...

Engineering Boolean Matrix Multiplication for Multiple-Accelerator Shared-Memory Architectures

We study the problem of multiplying two bit matrices with entries either...

Please sign up or login with your details

Forgot password? Click here to reset