Refresh Triggered Computation: Improving the Energy Efficiency of Convolutional Neural Network Accelerators

10/15/2019
by   Syed M. A. H. Jafri, et al.
0

Recently, many studies proposed CNN accelerator architectures with custom computation units that try to improve energy-efficiency and performance of CNN by minimizing data transfers from DRAM-based main memory. However, in these architectures, DRAM still contributes half of the overall energy consumption of the system, on average. A key factor of the high energy consumption of DRAM is the refresh overhead, which is estimated to consume 40 energy. We propose a new mechanism, Refresh Triggered Computation (RTC), that exploits the memory access patterns of CNN applications to reduce the number of refresh operations. RTC mainly uses two techniques to mitigate the refresh overhead. First, Refresh Triggered Transfer (RTT) is based on our new observation that a CNN application accesses a large portion of the DRAM in a predictable and recurring manner. Thus, the read/write accesses inherently refresh the DRAM, and therefore a significant fraction of refresh operations can be skipped. Second, Partial Array Auto-Refresh (PAAR) eliminates the refresh operations to DRAM regions that do not store any data. We propose three RTC designs, each of which requires a different level of aggressiveness in terms of customization to the DRAM subsystem. All of our designs have small overhead. Even the most aggressive design of RTC imposes an area overhead of only 0.18 denser chips. Our experimental evaluation on three well-known CNNs (i.e., AlexNet, LeNet, and GoogleNet) show that RTC can reduce the DRAM refresh energy from 25 respectively. Although we mainly use CNNs in our evaluations, we believe RTC can be applied to a wide range of applications, whose memory access patterns remain predictable for sufficiently long time.

READ FULL TEXT

page 1

page 13

research
02/15/2020

An Energy-Efficient Accelerator Architecture with Serial Accumulation Dataflow for Deep CNNs

Convolutional Neural Networks (CNNs) have shown outstanding accuracy for...
research
05/04/2022

DNA Pre-alignment Filter using Processing Near Racetrack Memory

Recent DNA pre-alignment filter designs employ DRAM for storing the refe...
research
09/18/2020

GrateTile: Efficient Sparse Tensor Tiling for CNN Processing

We propose GrateTile, an efficient, hardwarefriendly data storage scheme...
research
01/03/2023

A Theory of I/O-Efficient Sparse Neural Network Inference

As the accuracy of machine learning models increases at a fast rate, so ...
research
06/01/2022

YOLoC: DeploY Large-Scale Neural Network by ROM-based Computing-in-Memory using ResiduaL Branch on a Chip

Computing-in-memory (CiM) is a promising technique to achieve high energ...
research
03/29/2020

Analytical Model of Memory-Bound Applications Compiled with High Level Synthesis

The increasing demand of dedicated accelerators to improve energy effici...
research
05/23/2019

In-DRAM Bulk Bitwise Execution Engine

Many applications heavily use bitwise operations on large bitvectors as ...

Please sign up or login with your details

Forgot password? Click here to reset