A Unified Optimization Approach for CNN Model Inference on Integrated GPUs

07/03/2019
by   Leyuan Wang, et al.
0

Modern deep learning applications urge to push the model inference taking place at the edge devices for multiple reasons such as achieving shorter latency, relieving the burden of the network connecting to the cloud, and protecting user privacy. The Convolutional Neural Network (CNN) is one of the most widely used model family in the applications. Given the high computational complexity of the CNN models, it is favorable to execute them on the integrated GPUs at the edge devices, which are ubiquitous and have more power and better energy efficiency than the accompanying CPUs. However, programming on integrated GPUs efficiently is challenging due to the variety of their architectures and programming interfaces. This paper proposes an end-to-end solution to execute CNN model inference on the integrated GPUs at the edge, which uses a unified IR to represent and optimize vision-specific operators on integrated GPUs from multiple vendors, as well as leverages machine learning-based scheduling search schemes to optimize computationally-intensive operators like convolution. Our solution even provides a fallback mechanism for operators not suitable or convenient to run on GPUs. The evaluation results suggest that compared to state-of-the-art solutions backed up by the vendor-provided high-performance libraries on Intel Graphics, ARM Mali GPU, and Nvidia integrated Maxwell GPU, our solution achieves similar, or even better (up to 1.62×), performance on a number of popular image classification and object detection models. In addition, our solution has a wider model coverage and is more flexible to embrace new models. Our solution has been adopted in production services in AWS and is open-sourced.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/07/2018

Optimizing CNN Model Inference on CPUs

The popularity of Convolutional Neural Network (CNN) models and the ubiq...
research
11/21/2016

A Metaprogramming and Autotuning Framework for Deploying Deep Learning Applications

In recent years, deep neural networks (DNNs), have yielded strong result...
research
02/21/2022

Online Learning for Orchestration of Inference in Multi-User End-Edge-Cloud Networks

Deep-learning-based intelligent services have become prevalent in cyber-...
research
03/09/2023

GPU-enabled Function-as-a-Service for Machine Learning Inference

Function-as-a-Service (FaaS) is emerging as an important cloud computing...
research
04/30/2023

Multi-directional Sobel operator kernel on GPUs

Sobel is one of the most popular edge detection operators used in image ...
research
06/03/2021

Exploiting co-execution with oneAPI: heterogeneity from a modern perspective

Programming efficiently heterogeneous systems is a major challenge, due ...
research
01/26/2022

Unlocking Personalized Healthcare on Modern CPUs/GPUs: Three-way Gene Interaction Study

Developments in Genome-Wide Association Studies have led to the increasi...

Please sign up or login with your details

Forgot password? Click here to reset