Woodpecker-DL: Accelerating Deep Neural Networks via Hardware-Aware Multifaceted Optimizations

08/11/2020
by   Yongchao Liu, et al.
0

Accelerating deep model training and inference is crucial in practice. Existing deep learning frameworks usually concentrate on optimizing training speed and pay fewer attentions to inference-specific optimizations. Actually, model inference differs from training in terms of computation, e.g. parameters are refreshed each gradient update step during training, but kept invariant during inference. These special characteristics of model inference open new opportunities for its optimization. In this paper, we propose a hardware-aware optimization framework, namely Woodpecker-DL (WPK), to accelerate inference by taking advantage of multiple joint optimizations from the perspectives of graph optimization, automated searches, domain-specific language (DSL) compiler techniques and system-level exploration. In WPK, we investigated two new automated search approaches based on genetic algorithm and reinforcement learning, respectively, to hunt the best operator code configurations targeting specific hardware. A customized DSL compiler is further attached to these search algorithms to generate efficient codes. To create an optimized inference plan, WPK systematically explores high-speed operator implementations from third-party libraries besides our automatically generated codes and singles out the best implementation per operator for use. Extensive experiments demonstrated that on a Tesla P100 GPU, we can achieve the maximum speedup of 5.40 over cuDNN and 1.63 over TVM on individual convolution operators, and run up to 1.18 times faster than TensorRT for end-to-end model inference.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/12/2018

TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

There is an increasing need to bring machine learning to a wide diversit...
research
10/22/2022

ALT: Boosting Deep Learning Performance by Breaking the Wall between Graph and Operator Level Optimizations

Deep learning models rely on highly optimized tensor libraries for effic...
research
09/23/2019

Compiler-Level Matrix Multiplication Optimization for Deep Learning

An important linear algebra routine, GEneral Matrix Multiplication (GEMM...
research
01/23/2020

Chameleon: Adaptive Code Optimization for Expedited Deep Neural Network Compilation

Achieving faster execution with shorter compilation time can foster furt...
research
05/30/2019

Reinforcement Learning and Adaptive Sampling for Optimized DNN Compilation

Achieving faster execution with shorter compilation time can enable furt...
research
06/10/2020

OpEvo: An Evolutionary Method for Tensor Operator Optimization

Training and inference efficiency of deep neural networks highly rely on...
research
03/06/2021

Accelerating SLIDE Deep Learning on Modern CPUs: Vectorization, Quantizations, Memory Optimizations, and More

Deep learning implementations on CPUs (Central Processing Units) are gai...

Please sign up or login with your details

Forgot password? Click here to reset