Evaluation of STT-MRAM as a Scratchpad for Training in ML Accelerators

08/03/2023
by   Sourjya Roy, et al.
0

Progress in artificial intelligence and machine learning over the past decade has been driven by the ability to train larger deep neural networks (DNNs), leading to a compute demand that far exceeds the growth in hardware performance afforded by Moore's law. Training DNNs is an extremely memory-intensive process, requiring not just the model weights but also activations and gradients for an entire minibatch to be stored. The need to provide high-density and low-leakage on-chip memory motivates the exploration of emerging non-volatile memory for training accelerators. Spin-Transfer-Torque MRAM (STT-MRAM) offers several desirable properties for training accelerators, including 3-4x higher density than SRAM, significantly reduced leakage power, high endurance and reasonable access time. On the one hand, MRAM write operations require high write energy and latency due to the need to ensure reliable switching. In this study, we perform a comprehensive device-to-system evaluation and co-optimization of STT-MRAM for efficient ML training accelerator design. We devised a cross-layer simulation framework to evaluate the effectiveness of STT-MRAM as a scratchpad replacing SRAM in a systolic-array-based DNN accelerator. To address the inefficiency of writes in STT-MRAM, we propose to reduce write voltage and duration. To evaluate the ensuing accuracy-efficiency trade-off, we conduct a thorough analysis of the error tolerance of input activations, weights, and errors during the training. We propose heterogeneous memory configurations that enable training convergence with good accuracy. We show that MRAM provide up to 15-22x improvement in system level energy across a suite of DNN benchmarks under iso-capacity and iso-area scenarios. Further optimizing STT-MRAM write operations can provide over 2x improvement in write energy for minimal degradation in application-level training accuracy.

READ FULL TEXT

page 1

page 4

page 8

research
09/15/2019

TiM-DNN: Ternary in-Memory accelerator for Deep Neural Networks

The use of lower precision has emerged as a popular technique to optimiz...
research
06/28/2023

ReDy: A Novel ReRAM-centric Dynamic Quantization Approach for Energy-efficient CNN Inference

The primary operation in DNNs is the dot product of quantized input acti...
research
01/07/2019

HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array

With the rise of artificial intelligence in recent years, Deep Neural Ne...
research
05/09/2021

Efficiency-driven Hardware Optimization for Adversarially Robust Neural Networks

With a growing need to enable intelligence in embedded devices in the In...
research
02/17/2022

SWIM: Selective Write-Verify for Computing-in-Memory Neural Accelerators

Computing-in-Memory architectures based on non-volatile emerging memorie...
research
06/18/2021

Application-driven Design Exploration for Dense Ferroelectric Embedded Non-volatile Memories

The memory wall bottleneck is a key challenge across many data-intensive...
research
05/04/2023

CAMEL: Co-Designing AI Models and Embedded DRAMs for Efficient On-Device Learning

The emergence of the Internet of Things (IoT) has resulted in a remarkab...

Please sign up or login with your details

Forgot password? Click here to reset