Tuning Algorithms and Generators for Efficient Edge Inference

07/31/2019
by   Rawan Naous, et al.
0

A surge in artificial intelligence and autonomous technologies have increased the demand toward enhanced edge-processing capabilities. Computational complexity and size of state-of-the-art Deep Neural Networks (DNNs) are rising exponentially with diverse network models and larger datasets. This growth limits the performance scaling and energy-efficiency of both distributed and embedded inference platforms. Embedded designs at the edge are constrained by energy and speed limitations of available processor substrates and processor to memory communication required to fetch the model coefficients. While many hardware accelerator and network deployment frameworks have been in development, a framework is needed to allow the variety of existing architectures, and those in development, to be expressed in critical parts of the flow that perform various optimization steps. Moreover, premature architecture-blind network selection and optimization diminish the effectiveness of schedule optimizations and hardware-specific mappings. In this paper, we address these issues by creating a cross-layer software-hardware design framework that encompasses network training and model compression that is aware of and tuned to the underlying hardware architecture. This approach leverages the available degrees of DNN structure and sparsity to create a converged network that can be partitioned and efficiently scheduled on the target hardware platform, minimizing data movement, and improving the overall throughput and energy. To further streamline the design, we leverage the high-level, flexible SoC generator platform based on RISC-V ROCC framework. This integration allows seamless extensions of the RISC-V instruction set and Chisel-based rapid generator design. Utilizing this approach, we implemented a silicon prototype in a 16 nm TSMC process node achieving record processing efficiency of up to 18 TOPS/W.

READ FULL TEXT

page 5

page 6

page 10

research
08/29/2017

CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-CirculantWeight Matrices

Large-scale deep neural networks (DNNs) are both compute and memory inte...
research
04/21/2020

PMEvo: Portable Inference of Port Mappings for Out-of-Order Processors by Evolutionary Optimization

Achieving peak performance in a computer system requires optimizations i...
research
08/17/2021

Edge AI without Compromise: Efficient, Versatile and Accurate Neurocomputing in Resistive Random-Access Memory

Realizing today's cloud-level artificial intelligence functionalities di...
research
06/28/2021

HALF: Holistic Auto Machine Learning for FPGAs

Deep Neural Networks (DNNs) are capable of solving complex problems in d...
research
09/20/2021

Towards Energy-Efficient and Secure Edge AI: A Cross-Layer Framework

The security and privacy concerns along with the amount of data that is ...
research
08/23/2022

Adaptation of MobileNetV2 for Face Detection on Ultra-Low Power Platform

Designing Deep Neural Networks (DNNs) running on edge hardware remains a...
research
02/13/2023

The Framework Tax: Disparities Between Inference Efficiency in Research and Deployment

Increased focus on the deployment of machine learning systems has led to...

Please sign up or login with your details

Forgot password? Click here to reset