Training Kinetics in 15 Minutes: Large-scale Distributed Training on Videos

10/01/2019
by   Ji Lin, et al.
0

Deep video recognition is more computationally expensive than image recognition, especially on large-scale datasets like Kinetics [1]. Therefore, training scalability is essential to handle a large amount of videos. In this paper, we study the factors that impact the training scalability of video networks. We recognize three bottlenecks, including data loading (data movement from disk to GPU), communication (data movement over networking), and computation FLOPs. We propose three design guidelines to improve the scalability: (1) fewer FLOPs and hardware-friendly operator to increase the computation efficiency; (2) fewer input frames to reduce the data movement and increase the data loading efficiency; (3) smaller model size to reduce the networking traffic and increase the networking efficiency. With these guidelines, we designed a new operator Temporal Shift Module (TSM) that is efficient and scalable for distributed training. TSM model can achieve 1.8x higher throughput compared to previous I3D models. We scale up the training of the TSM model to 1,536 GPUs, with a mini-batch of 12,288 video clips/98,304 images, without losing the accuracy. With such hardware-aware model design, we are able to scale up the training on Summit supercomputer and reduce the training time on Kinetics dataset from 49 hours 55 minutes to 14 minutes 13 seconds, achieving a top-1 accuracy of 74.0 than previous 3D video models with higher accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/30/2018

Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes

Synchronized stochastic gradient descent (SGD) optimizers with data para...
research
09/27/2021

TSM: Temporal Shift Module for Efficient and Scalable Video Understanding on Edge Device

The explosive growth in video streaming requires video understanding at ...
research
01/24/2019

Large-Batch Training for LSTM and Beyond

Large-batch training approaches have enabled researchers to utilize larg...
research
10/02/2019

Accelerating Data Loading in Deep Neural Network Training

Data loading can dominate deep neural network training time on large-sca...
research
05/18/2018

Scanner: Efficient Video Analysis at Scale

A growing number of visual computing applications depend on the analysis...
research
07/15/2021

Real-Time Violence Detection Using CNN-LSTM

Violence rates however have been brought down about 57 the past 4 decade...
research
08/10/2017

Distributed Training Large-Scale Deep Architectures

Scale of data and scale of computation infrastructures together enable t...

Please sign up or login with your details

Forgot password? Click here to reset