Coherence Traffic in Manycore Processors with Opaque Distributed Directories

11/10/2020
by   Steve Kommrusch, et al.
0

Manycore processors feature a high number of general-purpose cores designed to work in a multithreaded fashion. Recent manycore processors are kept coherent using scalable distributed directories. A paramount example is the Intel Mesh interconnect, which consists of a network-on-chip interconnecting "tiles", each of which contains computation cores, local caches, and coherence masters. The distributed coherence subsystem must be queried for every out-of-tile access, imposing an overhead on memory latency. This paper studies the physical layout of an Intel Knights Landing processor, with a particular focus on the coherence subsystem, and uncovers the pseudo-random mapping function of physical memory blocks across the pieces of the distributed directory. Leveraging this knowledge, candidate optimizations to improve memory latency through the minimization of coherence traffic are studied. Although these optimizations do improve memory throughput, ultimately this does not translate into performance gains due to inherent overheads stemming from the computational complexity of the mapping functions.

READ FULL TEXT

page 3

page 4

page 9

page 12

page 14

page 15

page 18

research
04/07/2022

Memory Performance of AMD EPYC Rome and Intel Cascade Lake SP Server Processors

Modern processors, in particular within the server segment, integrate mo...
research
12/26/2021

Asynchronous Memory Access Unit for General Purpose Processors

In future data centers, applications will make heavy use of far memory (...
research
09/25/2019

An Improvement Over Threads Communications on Multi-Core Processors

Multicore is an integrated circuit chip that uses two or more computatio...
research
06/10/2017

LazyPIM: Efficient Support for Cache Coherence in Processing-in-Memory Architectures

Processing-in-memory (PIM) architectures have seen an increase in popula...
research
02/10/2020

Rainbow: A Composable Coherence Protocol for Multi-Chip Servers

The use of multi-chip modules (MCM) and/or multi-socket boards is the mo...
research
08/07/2021

Asymmetry-aware Scalable Locking

The pursuit of power-efficiency is popularizing asymmetric multicore pro...
research
12/21/2017

The Pyramid Scheme: Oblivious RAM for Trusted Processors

Modern processors, e.g., Intel SGX, allow applications to isolate secret...

Please sign up or login with your details

Forgot password? Click here to reset