Optimizing CNN Model Inference on CPUs

09/07/2018
by   Yizhi Liu, et al.
0

The popularity of Convolutional Neural Network (CNN) models and the ubiquity of CPUs imply that better performance of CNN model inference on CPUs can deliver significant gain to a large number of users. The current approach to improving the performance of CNN model inference on CPUs relies on the use of a hardware-specific library of low-level operations such as Intel MKL-DNN and some basic model-level optimizations, which is restrictive and misses the opportunity to optimize the end-to-end inference pipeline as a whole. This paper proposes a more comprehensive approach of CNN model inference on CPUs that employs a full-stack and systematic scheme of operation-level and model-level optimizations coupled with efficient data layout transformations. Experiments show that our solution achieves up to 2.81x better latency for CNN model inference on a 18-core Intel Platinum 8000-series CPU compared to the state-of-the-art implementations using Intel MKL-DNN.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/24/2018

Intel nGraph: An Intermediate Representation, Compiler, and Executor for Deep Learning

The Deep Learning (DL) community sees many novel topologies published ea...
research
07/03/2019

A Unified Optimization Approach for CNN Model Inference on Integrated GPUs

Modern deep learning applications urge to push the model inference takin...
research
03/08/2022

A Compilation Flow for the Generation of CNN Inference Accelerators on FPGAs

We present a compilation flow for the generation of CNN inference accele...
research
02/20/2019

DNNVM : End-to-End Compiler Leveraging Heterogeneous Optimizations on FPGA-based CNN Accelerators

The convolutional neural network (CNN) has become a state-of-the-art met...
research
02/06/2013

A Standard Approach for Optimizing Belief Network Inference using Query DAGs

This paper proposes a novel, algorithm-independent approach to optimizin...
research
11/01/2022

Strategies for Optimizing End-to-End Artificial Intelligence Pipelines on Intel Xeon Processors

End-to-end (E2E) artificial intelligence (AI) pipelines are composed of ...
research
06/13/2018

SIMD Vectorization for the Lennard-Jones Potential with AVX2 and AVX-512 instructions

This work describes the SIMD vectorization of the force calculation of t...

Please sign up or login with your details

Forgot password? Click here to reset