U-Boost NAS: Utilization-Boosted Differentiable Neural Architecture Search

03/23/2022
by   Ahmet Caner Yüzügüler, et al.
0

Optimizing resource utilization in target platforms is key to achieving high performance during DNN inference. While optimizations have been proposed for inference latency, memory footprint, and energy consumption, prior hardware-aware neural architecture search (NAS) methods have omitted resource utilization, preventing DNNs to take full advantage of the target inference platforms. Modeling resource utilization efficiently and accurately is challenging, especially for widely-used array-based inference accelerators such as Google TPU. In this work, we propose a novel hardware-aware NAS framework that does not only optimize for task accuracy and inference latency, but also for resource utilization. We also propose and validate a new computational model for resource utilization in inference accelerators. By using the proposed NAS framework and the proposed resource utilization model, we achieve 2.8 - 4x speedup for DNN inference compared to prior hardware-aware NAS methods while attaining similar or improved accuracy in image classification on CIFAR-10 and Imagenet-100 datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/01/2021

FLASH: Fast Neural Architecture Search with Hardware Optimization

Neural architecture search (NAS) is a promising technique to design effi...
research
12/15/2022

A Study on the Intersection of GPU Utilization and CNN Inference

There has been significant progress in developing neural network archite...
research
06/17/2019

Hardware Aware Neural Network Architectures using FbNet

We implement a differentiable Neural Architecture Search (NAS) method in...
research
05/21/2020

AOWS: Adaptive and optimal network width search with latency constraints

Neural architecture search (NAS) approaches aim at automatically finding...
research
09/12/2023

Harmonic-NAS: Hardware-Aware Multimodal Neural Architecture Search on Resource-constrained Devices

The recent surge of interest surrounding Multimodal Neural Networks (MM-...
research
05/20/2019

DARC: Differentiable ARchitecture Compression

In many learning situations, resources at inference time are significant...
research
08/28/2020

Fifer: Tackling Underutilization in the Serverless Era

Datacenters are witnessing a rapid surge in the adoption of serverless f...

Please sign up or login with your details

Forgot password? Click here to reset