Bit-Exact ECC Recovery (BEER): Determining DRAM On-Die ECC Functions by Exploiting DRAM Data Retention Characteristics

09/17/2020
by   Minesh Patel, et al.
0

Increasing single-cell DRAM error rates have pushed DRAM manufacturers to adopt on-die error-correction coding (ECC), which operates entirely within a DRAM chip to improve factory yield. The on-die ECC function and its effects on DRAM reliability are considered trade secrets, so only the manufacturer knows precisely how on-die ECC alters the externally-visible reliability characteristics. Consequently, on-die ECC obstructs third-party DRAM customers (e.g., test engineers, experimental researchers), who typically design, test, and validate systems based on these characteristics. To give third parties insight into precisely how on-die ECC transforms DRAM error patterns during error correction, we introduce Bit-Exact ECC Recovery (BEER), a new methodology for determining the full DRAM on-die ECC function (i.e., its parity-check matrix) without hardware tools, prerequisite knowledge about the DRAM chip or on-die ECC mechanism, or access to ECC metadata (e.g., error syndromes, parity information). BEER exploits the key insight that non-intrusively inducing data-retention errors with carefully-crafted test patterns reveals behavior that is unique to a specific ECC function. We use BEER to identify the ECC functions of 80 real LPDDR4 DRAM chips with on-die ECC from three major DRAM manufacturers. We evaluate BEER's correctness in simulation and performance on a real system to show that BEER is effective and practical across a wide range of on-die ECC functions. To demonstrate BEER's value, we propose and discuss several ways that third parties can use BEER to improve their design and testing practices. As a concrete example, we introduce and evaluate BEEP, the first error profiling methodology that uses the known on-die ECC function to recover the number and bit-exact locations of unobservable raw bit errors responsible for observable post-correction errors.

READ FULL TEXT

page 8

page 12

research
06/27/2017

Using ECC DRAM to Adaptively Increase Memory Capacity

Modern DRAM modules are often equipped with hardware error correction ca...
research
05/08/2018

Voltron: Understanding and Exploiting the Voltage-Latency-Reliability Trade-Offs in Modern DRAM Chips to Improve Energy Efficiency

This paper summarizes our work on experimental characterization and anal...
research
05/29/2017

Understanding Reduced-Voltage Operation in Modern DRAM Chips: Characterization, Analysis, and Mechanisms

The energy consumption of DRAM is a critical concern in modern computing...
research
04/21/2022

A Case for Transparent Reliability in DRAM Systems

Today's systems have diverse needs that are difficult to address using o...
research
04/21/2022

Enabling Effective Error Mitigation in Memory Chips That Use On-Die Error-Correcting Codes

Improvements in main memory storage density are primarily driven by proc...
research
12/22/2020

SIMDRAM: A Framework for Bit-Serial SIMD Processing Using DRAM

Processing-using-DRAM has been proposed for a limited set of basic opera...
research
04/05/2020

DRAMDig: A Knowledge-assisted Tool to Uncover DRAM Address Mapping

As recently emerged rowhammer exploits require undocumented DRAM address...

Please sign up or login with your details

Forgot password? Click here to reset