Deep Learning Models on CPUs: A Methodology for Efficient Training

06/20/2022
by   Quchen Fu, et al.
45

GPUs have been favored for training deep learning models due to their highly parallelized architecture. As a result, most studies on training optimization focus on GPUs. There is often a trade-off, however, between cost and efficiency when deciding on how to choose the proper hardware for training. In particular, CPU servers can be beneficial if training on CPUs was more efficient, as they incur fewer hardware update costs and better utilizing existing infrastructure. This paper makes several contributions to research on training deep learning models using CPUs. First, it presents a method for optimizing the training of deep learning models on Intel CPUs and a toolkit called ProfileDNN, which we developed to improve performance profiling. Second, we describe a generic training optimization method that guides our workflow and explores several case studies where we identified performance issues and then optimized the Intel Extension for PyTorch, resulting in an overall 2x training performance increase for the RetinaNet-ResNext50 model. Third, we show how to leverage the visualization capabilities of ProfileDNN, which enabled us to pinpoint bottlenecks and create a custom focal loss kernel that was two times faster than the official reference PyTorch implementation.

READ FULL TEXT
research
01/08/2019

CROSSBOW: Scaling Deep Learning with Small Batch Sizes on Multi-GPU Servers

Deep learning models are trained on servers with many GPUs, and training...
research
06/10/2019

Performance Analysis and Characterization of Training Deep Learning Models on NVIDIA TX2

Training deep learning models on mobile devices recently becomes possibl...
research
06/10/2019

Performance Analysis and Characterization of Training Deep Learning Models on Mobile Devices

Training deep learning models on mobile devices recently becomes possibl...
research
06/15/2016

cltorch: a Hardware-Agnostic Backend for the Torch Deep Neural Network Library, Based on OpenCL

This paper presents cltorch, a hardware-agnostic backend for the Torch n...
research
05/27/2017

AMPNet: Asynchronous Model-Parallel Training for Dynamic Neural Networks

New types of machine learning hardware in development and entering the m...
research
06/22/2020

MaskIt: Masking for efficient utilization of incomplete public datasets for training deep learning models

A major challenge in training deep learning models is the lack of high q...
research
10/11/2022

Improving Sample Efficiency of Deep Learning Models in Electricity Market

The superior performance of deep learning relies heavily on a large coll...

Please sign up or login with your details

Forgot password? Click here to reset