Demystifying Developers' Issues in Distributed Training of Deep Learning Software

12/12/2021
by   Diandian Gu, et al.
0

Deep learning (DL) has been pervasive in a wide spectrum of nowadays software systems and applications. The rich features of these DL based software applications (i.e., DL software) usually rely on powerful DL models. To train powerful DL models with large datasets efficiently, it has been a common practice for developers to parallelize and distribute the computation and memory over multiple devices in the training process, which is known as distributed training. However, existing efforts in the software engineering (SE) research community mainly focus on issues in the general process of training DL models. In contrast, to the best of our knowledge, issues that developers encounter in distributed training have never been well studied. Given the surging importance of distributed training in the current practice of developing DL software, this paper fills in the knowledge gap and presents the first comprehensive study on developers' issues in distributed training. To this end, we extract and analyze 1,054 real-world developers' issues in distributed training from Stack Overflow and GitHub, two commonly used data sources for studying software issues. We construct a fine-grained taxonomy consisting of 30 categories regarding the fault symptoms and summarize common fix patterns for different symptoms. Based on the results, we suggest actionable implications and research avenues that can potentially facilitate the future development of distributed training.

READ FULL TEXT
research
01/13/2021

An Empirical Study on Deployment Faults of Deep Learning Based Mobile Applications

Deep Learning (DL) is finding its way into a growing number of mobile so...
research
05/02/2020

Understanding Challenges in Deploying Deep Learning Based Software: An Empirical Study

Deep learning (DL) becomes increasingly pervasive, being used in a wide ...
research
07/11/2022

A Secure Fingerprinting Framework for Distributed Image Classification

The deep learning (DL) technology has been widely used for image classif...
research
07/05/2021

Design Smells in Deep Learning Programs: An Empirical Study

Nowadays, we are witnessing an increasing adoption of Deep Learning (DL)...
research
07/21/2022

Demystifying Dependency Bugs in Deep Learning Stack

Recent breakthroughs in deep learning (DL) techniques have stimulated si...
research
12/04/2021

Understanding the Limits of Conventional Hardware Architectures for Deep-Learning

Deep learning and hardware for it has garnered immense academic and indu...
research
03/29/2022

Demystifying Software Release Note Issues on GitHub

Release notes (RNs) summarize main changes between two consecutive softw...

Please sign up or login with your details

Forgot password? Click here to reset