Scheduling Computation Graphs of Deep Learning Models on Manycore CPUs

07/16/2018
by   Linpeng Tang, et al.
0

For a deep learning model, efficient execution of its computation graph is key to achieving high performance. Previous work has focused on improving the performance for individual nodes of the computation graph, while ignoring the parallelization of the graph as a whole. However, we observe that running multiple operations simultaneously without interference is critical to efficiently perform parallelizable small operations. The attempt of executing the computation graph in parallel in deep learning frameworks usually involves much resource contention among concurrent operations, leading to inferior performance on manycore CPUs. To address these issues, in this paper, we propose Graphi, a generic and high-performance execution engine to efficiently execute a computation graph in parallel on manycore CPUs. Specifically, Graphi minimizes the interference on both software/hardware resources, discovers the best parallel setting with a profiler, and further optimizes graph execution with the critical-path first scheduling. Our experiments show that the parallel execution consistently outperforms the sequential one. The training times on four different neural networks with Graphi are 2.1x to 9.5x faster than those with TensorFlow on a 68-core Intel Xeon Phi processor.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/04/2020

Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning

Deep learning (DL) frameworks take advantage of GPUs to improve the spee...
research
10/21/2018

Runtime Concurrency Control and Operation Scheduling for High Performance Neural Network Training

Training neural network often uses a machine learning framework such as ...
research
09/04/2018

Improving the Expressiveness of Deep Learning Frameworks with Recursion

Recursive neural networks have widely been used by researchers to handle...
research
10/21/2018

RLgraph: Flexible Computation Graphs for Deep Reinforcement Learning

Reinforcement learning (RL) tasks are challenging to implement, execute ...
research
05/19/2023

A Generic Performance Model for Deep Learning in a Distributed Environment

Performance modelling of a deep learning application is essential to imp...
research
02/20/2018

Efficient Embedding of MPI Collectives in MXNET DAGs for scaling Deep Learning

Availability of high performance computing infrastructures such as clust...
research
02/25/2017

CHAOS: A Parallelization Scheme for Training Convolutional Neural Networks on Intel Xeon Phi

Deep learning is an important component of big-data analytic tools and i...

Please sign up or login with your details

Forgot password? Click here to reset