A Metaprogramming and Autotuning Framework for Deploying Deep Learning Applications

11/21/2016
by   Matthew W. Moskewicz, et al.
0

In recent years, deep neural networks (DNNs), have yielded strong results on a wide range of applications. Graphics Processing Units (GPUs) have been one key enabling factor leading to the current popularity of DNNs. However, despite increasing hardware flexibility and software programming toolchain maturity, high efficiency GPU programming remains difficult: it suffers from high complexity, low productivity, and low portability. GPU vendors such as NVIDIA have spent enormous effort to write special-purpose DNN libraries. However, on other hardware targets, especially mobile GPUs, such vendor libraries are not generally available. Thus, the development of portable, open, high-performance, energy-efficient GPU code for DNN operations would enable broader deployment of DNN-based algorithms. Toward this end, this work presents a framework to enable productive, high-efficiency GPU programming for DNN computations across hardware platforms and programming models. In particular, the framework provides specific support for metaprogramming, autotuning, and DNN-tailored data types. Using our framework, we explore implementing DNN operations on three different hardware targets: NVIDIA, AMD, and Qualcomm GPUs. On NVIDIA GPUs, we show both portability between OpenCL and CUDA as well competitive performance compared to the vendor library. On Qualcomm GPUs, we show that our framework enables productive development of target-specific optimizations, and achieves reasonable absolute performance. Finally, On AMD GPUs, we show initial results that indicate our framework can yield reasonable performance on a new platform with minimal effort.

READ FULL TEXT

page 3

page 4

page 8

page 10

research
04/08/2019

Accelerated Neural Networks on OpenCL Devices Using SYCL-DNN

Over the past few years machine learning has seen a renewed explosion of...
research
03/08/2023

SMaLL: A Software Framework for portable Machine Learning Libraries

Interest in deploying Deep Neural Network (DNN) inference on edge device...
research
06/01/2016

Boda-RTC: Productive Generation of Portable, Efficient Code for Convolutional Neural Networks on Mobile Computing Platforms

The popularity of neural networks (NNs) spans academia, industry, and po...
research
04/05/2016

dMath: A Scalable Linear Algebra and Math Library for Heterogeneous GP-GPU Architectures

A new scalable parallel math library, dMath, is presented in this paper ...
research
07/03/2019

A Unified Optimization Approach for CNN Model Inference on Integrated GPUs

Modern deep learning applications urge to push the model inference takin...
research
06/19/2022

Productive Reproducible Workflows for DNNs: A Case Study for Industrial Defect Detection

As Deep Neural Networks (DNNs) have become an increasingly ubiquitous wo...
research
05/16/2022

AnySeq/GPU: A Novel Approach for Faster Sequence Alignment on GPUs

In recent years, the rapidly increasing number of reads produced by next...

Please sign up or login with your details

Forgot password? Click here to reset