Addressing multiple bit/symbol errors in DRAM subsystem

08/05/2019
by   Ravikiran Yeleswarapu, et al.
0

As DRAM technology continues to evolve towards smaller feature sizes and increased densities, faults in DRAM subsystem are becoming more severe. Current servers mostly use CHIPKILL based schemes to tolerate up-to one/two symbol errors per DRAM beat. Multi-symbol errors arising due to faults in multiple data buses and chips may not be detected by these schemes. In this paper, we introduce Single Symbol Correction Multiple Symbol Detection (SSCMSD) - a novel error handling scheme to correct single-symbol errors and detect multi-symbol errors. Our scheme makes use of a hash in combination with Error Correcting Code (ECC) to avoid silent data corruptions (SDCs). SSCMSD can also enhance the capability of detecting errors in address bits. We employ 32-bit CRC along with Reed-Solomon code to implement SSCMSD for a x4 based DDRx system. Our simulations show that the proposed scheme effectively prevents SDCs in the presence of multiple symbol errors. Our novel design enabled us to achieve this without introducing additional READ latency. Also, we need 19 chips per rank (storage overhead of 18.75 percent), 76 data bus-lines and additional hash-logic at the memory controller.

READ FULL TEXT

page 3

page 8

page 9

page 10

research
11/11/2020

Error-correcting Codes for Short Tandem Duplication and Substitution Errors

Due to its high data density and longevity, DNA is considered a promisin...
research
01/23/2019

Bit Flipping Moment Balancing Schemes for Insertion, Deletion and Substitution Error Correction

In this paper, two moment balancing schemes, namely a variable index sch...
research
01/20/2018

Storage-Class Memory Hierarchies for Scale-Out Servers

With emerging storage-class memory (SCM) nearing commercialization, ther...
research
01/18/2023

Chip Guard ECC: An Efficient, Low Latency Method

Chip Guard is a new approach to symbol-correcting error correction codes...
research
09/26/2021

HARP: Practically and Effectively Identifying Uncorrectable Errors in Memory Chips That Use On-Die Error-Correcting Codes

State-of-the-art techniques for addressing scaling-related main memory e...
research
04/06/2022

Fast Fuzzing for Memory Errors

Greybox fuzzing is a proven effective testing method for the detection o...
research
05/13/2020

Residual Clipping Noise in Multi-layer Optical OFDM: Modeling, Analysis, and Application

Optical orthogonal frequency division multiplexing (O-OFDM) schemes are ...

Please sign up or login with your details

Forgot password? Click here to reset