Applying the Roofline model for Deep Learning performance optimizations

09/23/2020
by   Jacek Czaja, et al.
0

In this paper We present a methodology for creating Roofline models automatically for Non-Unified Memory Access (NUMA) using Intel Xeon as an example. Finally, we present an evaluation of highly efficient deep learning primitives as implemented in the Intel oneDNN Library.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/24/2018

Intel nGraph: An Intermediate Representation, Compiler, and Executor for Deep Learning

The Deep Learning (DL) community sees many novel topologies published ea...
research
02/06/2020

PolyScientist: Automatic Loop Transformations Combined with Microkernels for Optimization of Deep Learning Primitives

At the heart of deep learning training and inferencing are computational...
research
04/28/2019

Softmax Optimizations for Intel Xeon Processor-based Platforms

Softmax is popular normalization method used in machine learning. Deep l...
research
09/25/2017

Deep Learning Based Cryptographic Primitive Classification

Cryptovirological augmentations present an immediate, incomparable threa...
research
11/06/2015

Evaluation of the Intel Xeon Phi and NVIDIA K80 as accelerators for two-dimensional panel codes

To predict the properties of fluid flow over a solid geometry is an impo...
research
05/16/2017

Intel RealSense Stereoscopic Depth Cameras

We present a comprehensive overview of the stereoscopic Intel RealSense ...
research
01/10/2022

Studying the Potential of Automatic Optimizations in the Intel FPGA SDK for OpenCL

High Level Synthesis (HLS) tools, like the Intel FPGA SDK for OpenCL, im...

Please sign up or login with your details

Forgot password? Click here to reset