Performance Analysis of DNN Inference/Training with Convolution and non-Convolution Operations

06/29/2023
by   Hadi Esmaeilzadeh, et al.
0

Today's performance analysis frameworks for deep learning accelerators suffer from two significant limitations. First, although modern convolutional neural network (CNNs) consist of many types of layers other than convolution, especially during training, these frameworks largely focus on convolution layers only. Second, these frameworks are generally targeted towards inference, and lack support for training operations. This work proposes a novel performance analysis framework, SimDIT, for general ASIC-based systolic hardware accelerator platforms. The modeling effort of SimDIT comprehensively covers convolution and non-convolution operations of both CNN inference and training on a highly parameterizable hardware substrate. SimDIT is integrated with a backend silicon implementation flow and provides detailed end-to-end performance statistics (i.e., data access cost, cycle counts, energy, and power) for executing CNN inference and training workloads. SimDIT-enabled performance analysis reveals that on a 64X64 processing array, non-convolution operations constitute 59.5 In addition, by optimally distributing available off-chip DRAM bandwidth and on-chip SRAM resources, SimDIT achieves 18X performance improvement over a generic static resource allocation for ResNet-50 inference.

READ FULL TEXT

page 1

page 3

research
10/01/2020

CARLA: A Convolution Accelerator with a Reconfigurable and Low-Energy Architecture

Convolutional Neural Networks (CNNs) have proven to be extremely accurat...
research
02/19/2021

BPLight-CNN: A Photonics-based Backpropagation Accelerator for Deep Learning

Training deep learning networks involves continuous weight updates acros...
research
05/19/2021

Block Convolution: Towards Memory-Efficient Inference of Large-Scale CNNs on FPGA

Deep convolutional neural networks have achieved remarkable progress in ...
research
05/29/2020

A Unified Hardware Architecture for Convolutions and Deconvolutions in CNN

In this paper, a scalable neural network hardware architecture for image...
research
05/04/2018

MAESTRO: An Open-source Infrastructure for Modeling Dataflows within Deep Learning Accelerators

We present MAESTRO, a framework to describe and analyze CNN dataflows, a...
research
03/19/2021

Performance Analysis of Deep Learning Workloads on a Composable System

A composable infrastructure is defined as resources, such as compute, st...
research
04/12/2021

Optimizing the Whole-life Cost in End-to-end CNN Acceleration

The acceleration of CNNs has gained increasing atten-tion since their su...

Please sign up or login with your details

Forgot password? Click here to reset