Workload-Aware DRAM Error Prediction using Machine Learning

03/17/2020
by   Lev Mukhanov, et al.
0

The aggressive scaling of technology may have helped to meet the growing demand for higher memory capacity and density, but has also made DRAM cells more prone to errors. Such a reality triggered a lot of interest in modeling DRAM behavior for either predicting the errors in advance or for adjusting DRAM circuit parameters to achieve a better trade-off between energy efficiency and reliability. Existing modeling efforts may have studied the impact of few operating parameters and temperature on DRAM reliability using custom FPGAs setups, however they neglected the combined effect of workload-specific features that can be systematically investigated only on a real system. In this paper, we present the results of our study on workload-dependent DRAM error behavior within a real server considering various operating parameters, such as the refresh rate, voltage and temperature. We show that the rate of single- and multi-bit errors may vary across workloads by 8x, indicating that program inherent features can affect DRAM reliability significantly. Based on this observation, we extract 249 features, such as the memory access rate, the rate of cache misses, the memory reuse time and data entropy, from various compute-intensive, caching and analytics benchmarks. We apply several supervised learning methods to construct the DRAM error behavior model for 72 server-grade DRAM chips using the memory operating parameters and extracted program inherent features. Our results show that, with an appropriate choice of program features and supervised learning method, the rate of single- and multi-bit errors can be predicted for a specific DRAM module with an average error of less than 10.5 a conventional workload-unaware error model.

READ FULL TEXT
research
05/08/2018

Voltron: Understanding and Exploiting the Voltage-Latency-Reliability Trade-Offs in Modern DRAM Chips to Improve Energy Efficiency

This paper summarizes our work on experimental characterization and anal...
research
10/08/2022

SpyHammer: Using RowHammer to Remotely Spy on Temperature

RowHammer is a DRAM vulnerability that can cause bit errors in a victim ...
research
01/18/2022

VAR-DRAM: Variation-Aware Framework for Efficient Dynamic Random Access Memory Design

Dynamic Random Access Memory (DRAM) is the de-facto choice for main memo...
research
05/29/2017

Understanding Reduced-Voltage Operation in Modern DRAM Chips: Characterization, Analysis, and Mechanisms

The energy consumption of DRAM is a critical concern in modern computing...
research
06/27/2017

Using ECC DRAM to Adaptively Increase Memory Capacity

Modern DRAM modules are often equipped with hardware error correction ca...
research
06/24/2019

Container Density Improvements with Dynamic Memory Extension using NAND Flash

While containers efficiently implement the idea of operating-system-leve...
research
07/08/2022

Modeling and Predicting Transistor Aging under Workload Dependency using Machine Learning

The pivotal issue of reliability is one of colossal concern for circuit ...

Please sign up or login with your details

Forgot password? Click here to reset