SMAUG: End-to-End Full-Stack Simulation Infrastructure for Deep Learning Workloads

12/10/2019
by   Yuan Yao, et al.
0

In recent years, there has been tremendous advances in hardware acceleration of deep neural networks. However, most of the research has focused on optimizing accelerator microarchitecture for higher performance and energy efficiency on a per-layer basis. We find that for overall single-batch inference latency, the accelerator may only make up 25-40 on data movement and in the deep learning software framework. Thus far, it has been very difficult to study end-to-end DNN performance during early stage design (before RTL is available) because there are no existing DNN frameworks that support end-to-end simulation with easy custom hardware accelerator integration. To address this gap in research infrastructure, we present SMAUG, the first DNN framework that is purpose-built for simulation of end-to-end deep learning applications. SMAUG offers researchers a wide range of capabilities for evaluating DNN workloads, from diverse network topologies to easy accelerator modeling and SoC integration. To demonstrate the power and value of SMAUG, we present case studies that show how we can optimize overall performance and energy efficiency for up to 1.8-5x speedup over a baseline system, without changing any part of the accelerator microarchitecture, as well as show how SMAUG can tune an SoC for a camera-powered deep learning pipeline.

READ FULL TEXT

page 1

page 4

page 11

research
07/11/2018

VTA: An Open Hardware-Software Stack for Deep Learning

Hardware acceleration is an enabler for ubiquitous and efficient deep le...
research
04/23/2018

BrainSlug: Transparent Acceleration of Deep Learning Through Depth-First Parallelism

Neural network frameworks such as PyTorch and TensorFlow are the workhor...
research
11/29/2021

A Highly Configurable Hardware/Software Stack for DNN Inference Acceleration

This work focuses on an efficient Agile design methodology for domain-sp...
research
12/06/2022

Integration of a systolic array based hardware accelerator into a DNN operator auto-tuning framework

The deployment of neural networks on heterogeneous SoCs coupled with cus...
research
05/25/2023

Are We There Yet? Product Quantization and its Hardware Acceleration

Conventional multiply-accumulate (MAC) operations have long dominated co...
research
05/04/2018

MAESTRO: An Open-source Infrastructure for Modeling Dataflows within Deep Learning Accelerators

We present MAESTRO, a framework to describe and analyze CNN dataflows, a...
research
03/19/2021

Performance Analysis of Deep Learning Workloads on a Composable System

A composable infrastructure is defined as resources, such as compute, st...

Please sign up or login with your details

Forgot password? Click here to reset