CXLMemUring: A Hardware Software Co-design Paradigm for Asynchronous and Flexible Parallel CXL Memory Pool Access

09/07/2023
by   Yiwei Yang, et al.
0

CXL has been the emerging technology for expanding memory for both the host CPU and device accelerators with load/store interface. Extending memory coherency to the PCIe root complex makes the codesign more flexible in that you can access the memory with coherency using your near-device computability. Since the capacity demand with tolerable latency and bandwidth is growing, we need to come up with a new hardware-software codesign way to offload the synthesized memory operations to the CXL endpoint, CXL switch or near CXL root complex cores like Intel DSA to fetch data; the CPU or accelerators can calculate other stuff in the backend. On CXL done loading, the data will be put into L1 if capacity fits, and the in-core ROB will be notified by mailbox and resume the calculation on the previous hardware context. Since the distance(timing window) of the load instruction sequence is unknown, a profiling-guided way of codegening and adaptively updating offloaded code will be required for a long-running job. We propose to evaluate CXLMemUring the modified BOOMv3 with added in-core-logic and CXL endpoint access simulation using CHI, and we will add a weaker RISCV Core near endpoint for code offloading, and the codegening will be based on program analysis with traditional profiling guided way.

READ FULL TEXT

page 1

page 2

page 3

research
08/18/2019

CHoNDA: Near Data Acceleration with Concurrent Host Access

Near-data accelerators (NDAs) that are integrated with main memory have ...
research
03/02/2022

Computation offloading to hardware accelerators in Intel SGX and Gramine Library OS

The Intel Software Guard Extensions (SGX) technology enables application...
research
10/03/2013

Cudagrind: A Valgrind Extension for CUDA

Valgrind, and specifically the included tool Memcheck, offers an easy an...
research
08/24/2020

Tearing Down the Memory Wall

We present a vision for the Erudite architecture that redefines the comp...
research
09/07/2023

METICULOUS: An FPGA-based Main Memory Emulator for System Software Studies

Due to the scaling problem of the DRAM technology, non-volatile memory d...
research
03/24/2021

RDMA is Turing complete, we just did not know it yet!

It is becoming increasingly popular for distributed systems to exploit n...
research
05/06/2019

Multi-threaded Output in CMS using ROOT

CMS has worked aggressively to make use of multi-core architectures, rou...

Please sign up or login with your details

Forgot password? Click here to reset