ORCA: A Network and Architecture Co-design for Offloading us-scale Datacenter Applications

03/16/2022
by   Yifan Yuan, et al.
0

Responding to the "datacenter tax" and "killer microseconds" problems for datacenter applications, diverse solutions including Smart NIC-based ones have been proposed. Nonetheless, they often suffer from high overhead of communications over network and/or PCIe links. To tackle the limitations of the current solutions, this paper proposes ORCA, a holistic network and architecture co-design solution that leverages current RDMA and emerging cache-coherent off-chip interconnect technologies. Specifically, ORCA consists of four hardware and software components: (1) unified abstraction of inter- and intra-machine communications managed by one-sided RDMA write and cache-coherent memory write; (2) efficient notification of requests to accelerators assisted by cache coherence; (3) cache-coherent accelerator architecture directly processing requests received by NIC; and (4) adaptive device-to-host data transfer for modern server memory systems consisting of both DRAM and NVM exploiting state-of-the-art features in CPUs and PCIe. We prototype ORCA with a commercial system and evaluate three popular datacenter applications: in-memory key-value store, chain replication-based distributed transaction system, and deep learning recommendation model inference. The evaluation shows that ORCA provides 30.1 69.1 power efficiency than the current state-of-the-art solutions.

READ FULL TEXT

page 1

page 4

research
09/14/2021

Cohmeleon: Learning-Based Orchestration of Accelerator Coherence in Heterogeneous SoCs

One of the most critical aspects of integrating loosely-coupled accelera...
research
08/04/2019

Analysis and Optimization of I/O Cache Coherency Strategies for SoC-FPGA Device

Unlike traditional PCIe-based FPGA accelerators, heterogeneous SoC-FPGA ...
research
05/03/2023

NVMM cache design: Logging vs. Paging

Modern NVMM is closing the gap between DRAM and persistent storage, both...
research
05/06/2023

Memory Disaggregation: Advances and Open Challenges

Compute and memory are tightly coupled within each server in traditional...
research
05/14/2021

NVCache: A Plug-and-Play NVMM-based I/O Booster for Legacy Systems

This paper introduces NVCache, an approach that uses a non-volatile main...
research
12/09/2020

Virtual-Link: A Scalable Multi-Producer, Multi-Consumer Message Queue Architecture for Cross-Core Communication

Cross-core communication is increasingly a bottleneck as the number of p...
research
01/14/2023

Failure Tolerant Training with Persistent Memory Disaggregation over CXL

This paper proposes TRAININGCXL that can efficiently process large-scale...

Please sign up or login with your details

Forgot password? Click here to reset