Runtime Concurrency Control and Operation Scheduling for High Performance Neural Network Training

10/21/2018
by   Jiawen Liu, et al.
4

Training neural network often uses a machine learning framework such as TensorFlow and Caffe2. These frameworks employ a dataflow model where the NN training is modeled as a directed graph composed of a set of nodes. Operations in neural network training are typically implemented by the frameworks as primitives and represented as nodes in the dataflow graph. Training NN models in a dataflow-based machine learning framework involves a large number of fine-grained operations. Those operations have diverse memory access patterns and computation intensity. How to manage and schedule those operations is challenging, because we have to decide the number of threads to run each operation (concurrency control) and schedule those operations for good hardware utilization and system throughput. In this paper, we extend an existing runtime system (the TensorFlow runtime) to enable automatic concurrency control and scheduling of operations. We explore performance modeling to predict the performance of operations with various thread-level parallelism. Our performance model is highly accurate and lightweight. Leveraging the performance model, our runtime system employs a set of scheduling strategies that co-run operations to improve hardware utilization and system throughput. Our runtime system demonstrates a big performance benefit. Comparing with using the recommended configurations for concurrency control and operation scheduling in TensorFlow, our approach achieves 33 performance (execution time) improvement on average (up to 49 neural network models, and achieves high performance closing to the optimal one manually obtained by the user.

READ FULL TEXT

page 1

page 9

research
07/16/2018

Scheduling Computation Graphs of Deep Learning Models on Manycore CPUs

For a deep learning model, efficient execution of its computation graph ...
research
08/10/2020

tf-Darshan: Understanding Fine-grained I/O Performance in Machine Learning Workloads

Machine Learning applications on HPC systems have been gaining popularit...
research
05/22/2017

On-the-fly Operation Batching in Dynamic Computation Graphs

Dynamic neural network toolkits such as PyTorch, DyNet, and Chainer offe...
research
05/19/2023

A Generic Performance Model for Deep Learning in a Distributed Environment

Performance modelling of a deep learning application is essential to imp...
research
11/30/2020

Value Function Based Performance Optimization of Deep Learning Workloads

As machine learning techniques become ubiquitous, the efficiency of neur...
research
02/07/2020

Understanding and Optimizing Packed Neural Network Training for Hyper-Parameter Tuning

As neural networks are increasingly employed in machine learning practic...
research
01/17/2023

Robust Scheduling with GFlowNets

Finding the best way to schedule operations in a computation graph is a ...

Please sign up or login with your details

Forgot password? Click here to reset