Optimizing CNN Model Inference on CPUs

09/07/2018

∙

The popularity of Convolutional Neural Network (CNN) models and the ubiquity of CPUs imply that better performance of CNN model inference on CPUs can deliver significant gain to a large number of users. The current approach to improving the performance of CNN model inference on CPUs relies on the use of a hardware-specific library of low-level operations such as Intel MKL-DNN and some basic model-level optimizations, which is restrictive and misses the opportunity to optimize the end-to-end inference pipeline as a whole. This paper proposes a more comprehensive approach of CNN model inference on CPUs that employs a full-stack and systematic scheme of operation-level and model-level optimizations coupled with efficient data layout transformations. Experiments show that our solution achieves up to 2.81x better latency for CNN model inference on a 18-core Intel Platinum 8000-series CPU compared to the state-of-the-art implementations using Intel MKL-DNN.

READ FULL TEXT

Optimizing CNN Model Inference on CPUs

Sign in with Google

Consider DeepAI Pro