A Case for CXL-Centric Server Processors

05/08/2023
by   Albert Cho, et al.
0

The memory system is a major performance determinant for server processors. Ever-growing core counts and datasets demand higher bandwidth and capacity as well as lower latency from the memory system. To keep up with growing demands, DDR–the dominant processor interface to memory over the past two decades–has offered higher bandwidth with every generation. However, because each parallel DDR interface requires a large number of on-chip pins, the processor's memory bandwidth is ultimately restrained by its pin-count, which is a scarce resource. With limited bandwidth, multiple memory requests typically contend for each memory channel, resulting in significant queuing delays that often overshadow DRAM's service time and degrade performance. We present CoaXiaL, a server design that overcomes memory bandwidth limitations by replacing all DDR interfaces to the processor with the more pin-efficient CXL interface. The widespread adoption and industrial momentum of CXL makes such a transition possible, offering 4× higher bandwidth per pin compared to DDR at a modest latency overhead. We demonstrate that, for a broad range of workloads, CXL's latency premium is more than offset by its higher bandwidth. As CoaXiaL distributes memory requests across more channels, it drastically reduces queuing delays and thereby both the average value and variance of memory access latency. Our evaluation with a variety of workloads shows that CoaXiaL improves the performance of manycore throughput-oriented servers by 1.52× on average and by up to 3×.

READ FULL TEXT

page 8

page 10

research
03/15/2023

Workload Behavior Driven Memory Subsystem Design for Hyperscale

Hyperscalars run services across a large fleet of servers, serving billi...
research
06/20/2023

An Introduction to the Compute Express Link (CXL) Interconnect

The Compute Express Link (CXL) is an open industry-standard interconnect...
research
09/01/2020

A Survey on Recent Hardware Data Prefetching Approaches with An Emphasis on Servers

Data prefetching, i.e., the act of predicting application's future memor...
research
08/17/2020

CARGO : Context Augmented Critical Region Offload for Network-bound datacenter Workloads

Network bound applications, like a database server executing OLTP querie...
research
08/26/2016

When to use 3D Die-Stacked Memory for Bandwidth-Constrained Big Data Workloads

Response time requirements for big data processing systems are shrinking...
research
11/18/2022

AXI-Pack: Near-Memory Bus Packing for Bandwidth-Efficient Irregular Workloads

Data-intensive applications involving irregular memory streams are ineff...
research
06/16/2020

ZnG: Architecting GPU Multi-Processors with New Flash for Scalable Data Analysis

We propose ZnG, a new GPU-SSD integrated architecture, which can maximiz...

Please sign up or login with your details

Forgot password? Click here to reset