A Study on the Intersection of GPU Utilization and CNN Inference

12/15/2022
by   Jack Kosaian, et al.
0

There has been significant progress in developing neural network architectures that both achieve high predictive performance and that also achieve high application-level inference throughput (e.g., frames per second). Another metric of increasing importance is GPU utilization during inference: the measurement of how well a deployed neural network uses the computational capabilities of the GPU on which it runs. Achieving high GPU utilization is critical to increasing application-level throughput and ensuring a good return on investment for deploying GPUs. This paper analyzes the GPU utilization of convolutional neural network (CNN) inference. We first survey the GPU utilization of CNNs to show that there is room to improve the GPU utilization of many of these CNNs. We then investigate the GPU utilization of networks within a neural architecture search (NAS) search space, and explore how using GPU utilization as a metric could potentially be used to accelerate NAS itself. Our study makes the case that there is room to improve the inference-time GPU utilization of CNNs and that knowledge of GPU utilization has the potential to benefit even applications that do not target utilization itself. We hope that the results of this study will spur future innovation in designing GPU-efficient neural networks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/23/2022

U-Boost NAS: Utilization-Boosted Differentiable Neural Architecture Search

Optimizing resource utilization in target platforms is key to achieving ...
research
08/08/2020

Spatial Sharing of GPU for Autotuning DNN models

GPUs are used for training, inference, and tuning the machine learning m...
research
01/04/2021

Generalized Latency Performance Estimation for Once-For-All Neural Architecture Search

Neural Architecture Search (NAS) has enabled the possibility of automate...
research
12/15/2022

Colab NAS: Obtaining lightweight task-specific convolutional neural networks following Occam's razor

The current trend of applying transfer learning from CNNs trained on lar...
research
06/09/2023

S^3: Increasing GPU Utilization during Generative Inference for Higher Throughput

Generating texts with a large language model (LLM) consumes massive amou...
research
12/31/2018

Dynamic Space-Time Scheduling for GPU Inference

Serving deep neural networks in latency critical interactive settings of...
research
02/13/2021

Self-Reorganizing and Rejuvenating CNNs for Increasing Model Capacity Utilization

In this paper, we propose self-reorganizing and rejuvenating convolution...

Please sign up or login with your details

Forgot password? Click here to reset