ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked Models

05/07/2021
by   Matthias Wess, et al.
6

With new accelerator hardware for DNN, the computing power for AI applications has increased rapidly. However, as DNN algorithms become more complex and optimized for specific applications, latency requirements remain challenging, and it is critical to find the optimal points in the design space. To decouple the architectural search from the target hardware, we propose a time estimation framework that allows for modeling the inference latency of DNNs on hardware accelerators based on mapping and layer-wise estimation models. The proposed methodology extracts a set of models from micro-kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation. We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation. We test the mixed models on the ZCU102 SoC board with DNNDK and Intel Neural Compute Stick 2 on a set of 12 state-of-the-art neural networks. It shows an average estimation error of 3.47 for the DNNDK and 7.44 analytical layer models for almost all selected networks. For a randomly selected subset of 34 networks of the NASBench dataset, the mixed model reaches fidelity of 0.988 in Spearman's rank correlation coefficient metric. The code of ANNETTE is publicly available at https://github.com/embedded-machine-learning/annette.

READ FULL TEXT

page 4

page 5

page 11

page 14

research
03/18/2019

Software-Defined Design Space Exploration for an Efficient AI Accelerator Architecture

Deep neural networks (DNNs) have been shown to outperform conventional m...
research
08/12/2023

Instruction Set Architecture (ISA) for Processing-in-Memory DNN Accelerators

In this article, we introduce an instruction set architecture (ISA) for ...
research
03/14/2023

DeepAxe: A Framework for Exploration of Approximation and Reliability Trade-offs in DNN Accelerators

While the role of Deep Neural Networks (DNNs) in a wide range of safety-...
research
06/08/2023

Mixed-TD: Efficient Neural Network Accelerator with Layer-Specific Tensor Decomposition

Neural Network designs are quite diverse, from VGG-style to ResNet-style...
research
05/25/2022

MAPLE-X: Latency Prediction with Explicit Microprocessor Prior Knowledge

Deep neural network (DNN) latency characterization is a time-consuming p...
research
12/10/2022

DeFiNES: Enabling Fast Exploration of the Depth-first Scheduling Space for DNN Accelerators through Analytical Modeling

DNN workloads can be scheduled onto DNN accelerators in many different w...
research
11/27/2018

AI Matrix - Synthetic Benchmarks for DNN

Deep neural network (DNN) architectures, such as convolutional neural ne...

Please sign up or login with your details

Forgot password? Click here to reset