When to use 3D Die-Stacked Memory for Bandwidth-Constrained Big Data Workloads

08/26/2016
by   Jason Lowe-Power, et al.
0

Response time requirements for big data processing systems are shrinking. To meet this strict response time requirement, many big data systems store all or most of their data in main memory to reduce the access latency. Main memory capacities have grown, and systems with 2 TB of main memory capacity available today. However, the rate at which processors can access this data--the memory bandwidth--has not grown at the same rate. In fact, some of these big-memory systems can access less than 10 (billions of processor cycles). 3D die-stacking is one promising solution to this bandwidth problem, and industry is investing significantly in 3D die-stacking. We use a simple back-of-the-envelope-style model to characterize if and when the 3D die-stacked architecture is more cost-effective than current architectures for in-memory big data workloads. We find that die-stacking has much higher performance than current systems (up to 256x lower response times), and it does not require expensive memory over provisioning to meet real-time (10 ms) response time service-level agreements. However, the power requirements of the die-stacked systems are significantly higher (up to 50x) than current systems, and its memory capacity is lower in many cases. Even in this limited case study, we find 3D die-stacking is not a panacea. Today, die-stacking is the most cost-effective solution for strict SLAs and by reducing the power of the compute chip and increasing memory densities die-stacking can be cost-effective under other constraints in the future.

READ FULL TEXT
research
09/24/2018

Die-Stacked DRAM: Memory, Cache, or MemCache?

Die-stacked DRAM is a promising solution for satisfying the ever-increas...
research
04/27/2022

Memory-Disaggregated In-Memory Object Store Framework for Big Data Applications

The concept of memory disaggregation has recently been gaining traction ...
research
05/08/2023

A Case for CXL-Centric Server Processors

The memory system is a major performance determinant for server processo...
research
10/26/2017

CODA: Enabling Co-location of Computation and Data for Near-Data Processing

Recent studies have demonstrated that near-data processing (NDP) is an e...
research
03/06/2020

Stretching the capacity of Hardware Transactional Memory in IBM POWER architectures

The hardware transactional memory (HTM) implementations in commercially ...
research
10/22/2019

The Bitlet Model: Defining a Litmus Test for the Bitwise Processing-in-Memory Paradigm

This paper describes an analytical modeling tool called Bitlet that can ...
research
07/17/2017

Performance Implications of NoCs on 3D-Stacked Memories: Insights from the Hybrid Memory Cube

Memories that exploit three-dimensional (3D)-stacking technology, which ...

Please sign up or login with your details

Forgot password? Click here to reset