Optimally Scheduling CNN Convolutions for Efficient Memory Access

02/04/2019
by   Arthur Stoutchinin, et al.
0

Embedded inference engines for convolutional networks must be parsimonious in memory bandwidth and buffer sizing to meet power and cost constraints. We present an analytical memory bandwidth model for loop-nest optimization targeting architectures with application managed buffers. We applied this model to optimize the CNN convolution loop-nest. We show that our model is more accurate than previously published models. Using this model we can identify non-trivial dataflow schedules that result in lowest communication bandwidth given tight local buffering constraints. We show that optimal dataflow schedules are implementable in practice and that our model is accurate with respect to a real implementation; moreover, we introduce an accelerator architecture, named Hardware Convolution Block (HWC), which implements the optimal schedules, and we show it achieves up to 14x memory bandwidth reduction compared to a previously published accelerator with a similar memory interface, but implementing a non-optimal schedule.

READ FULL TEXT

page 3

page 4

page 5

page 6

page 7

page 9

page 12

page 13

research
05/02/2022

BSRA: Block-based Super Resolution Accelerator with Hardware Efficient Pixel Attention

Increasingly, convolution neural network (CNN) based super resolution mo...
research
11/02/2020

On the Impact of Partial Sums on Interconnect Bandwidth and Memory Accesses in a DNN Accelerator

Dedicated accelerators are being designed to address the huge resource r...
research
11/08/2017

Hydra: An Accelerator for Real-Time Edge-Aware Permeability Filtering in 65nm CMOS

Many modern video processing pipelines rely on edge-aware (EA) filtering...
research
04/30/2018

Ultra Power-Efficient CNN Domain Specific Accelerator with 9.3TOPS/Watt for Mobile and Embedded Applications

Computer vision performances have been significantly improved in recent ...
research
04/25/2018

Giving Text Analytics a Boost

The amount of textual data has reached a new scale and continues to grow...
research
05/02/2022

Zebra: Memory Bandwidth Reduction for CNN Accelerators With Zero Block Regularization of Activation Maps

The large amount of memory bandwidth between local buffer and external D...
research
07/22/2020

ZigZag: A Memory-Centric Rapid DNN Accelerator Design Space Exploration Framework

Building efficient embedded deep learning systems requires a tight co-de...

Please sign up or login with your details

Forgot password? Click here to reset