MemPool-3D: Boosting Performance and Efficiency of Shared-L1 Memory Many-Core Clusters with 3D Integration

12/02/2021
by   Matheus Cavalcante, et al.
0

Three-dimensional integrated circuits promise power, performance, and footprint gains compared to their 2D counterparts, thanks to drastic reductions in the interconnects' length through their smaller form factor. We can leverage the potential of 3D integration by enhancing MemPool, an open-source many-core design with 256 cores and a shared pool of L1 scratchpad memory connected with a low-latency interconnect. MemPool's baseline 2D design is severely limited by routing congestion and wire propagation delay, making the design ideal for 3D integration. In architectural terms, we increase MemPool's scratchpad memory capacity beyond the sweet spot for 2D designs, improving performance in a common digital signal processing kernel. We propose a 3D MemPool design that leverages a smart partitioning of the memory resources across two layers to balance the size and utilization of the stacked dies. In this paper, we explore the architectural and the technology parameter spaces by analyzing the power, performance, area, and energy efficiency of MemPool instances in 2D and 3D with 1 MiB, 2 MiB, 4 MiB, and 8 MiB of scratchpad memory in a commercial 28 nm technology node. We observe a performance gain of 9.1 multiplication on the MemPool-3D design with 4 MiB of scratchpad memory compared to the MemPool 2D counterpart. In terms of energy efficiency, we can implement the MemPool-3D instance with 4 MiB of L1 memory on an energy budget 15 instance with one-fourth of the L1 scratchpad memory capacity.

READ FULL TEXT

page 1

page 2

page 3

research
12/05/2020

MemPool: A Shared-L1 Memory Many-Core Cluster with a Low-Latency Interconnect

A key challenge in scaling shared-L1 multi-core clusters towards many-co...
research
04/14/2020

Energy-Efficient Hardware-Accelerated Synchronization for Shared-L1-Memory Multiprocessor Clusters

The steeply growing performance demands for highly power- and energy-con...
research
07/16/2022

Spatz: A Compact Vector Processing Unit for High-Performance and Energy-Efficient Shared-L1 Clusters

While parallel architectures based on clusters of Processing Elements (P...
research
03/30/2023

MemPool: A Scalable Manycore Architecture with a Low-Latency Shared L1 Memory

Shared L1 memory clusters are a common architectural pattern (e.g., in G...
research
08/30/2016

A near-threshold RISC-V core with DSP extensions for scalable IoT Endpoint Devices

Endpoint devices for Internet-of-Things not only need to work under extr...
research
09/02/2022

Soft Tiles: Capturing Physical Implementation Flexibility for Tightly-Coupled Parallel Processing Clusters

Modern high-performance computing architectures (Multicore, GPU, Manycor...

Please sign up or login with your details

Forgot password? Click here to reset