What Do Developers Ask About ML Libraries? A Large-scale Study Using Stack Overflow

by   Md Johirul Islam, et al.

Modern software systems are increasingly including machine learning (ML) as an integral component. However, we do not yet understand the difficulties faced by software developers when learning about ML libraries and using them within their systems. To that end, this work reports on a detailed (manual) examination of 3,243 highly-rated Q&A posts related to ten ML libraries, namely Tensorflow, Keras, scikit-learn, Weka, Caffe, Theano, MLlib, Torch, Mahout, and H2O, on Stack Overflow, a popular online technical Q&A forum. We classify these questions into seven typical stages of an ML pipeline to understand the correlation between the library and the stage. Then we study the questions and perform statistical analysis to explore the answer to four research objectives (finding the most difficult stage, understanding the nature of problems, nature of libraries and studying whether the difficulties stayed consistent over time). Our findings reveal the urgent need for software engineering (SE) research in this area. Both static and dynamic analyses are mostly absent and badly needed to help developers find errors earlier. While there has been some early research on debugging, much more work is needed. API misuses are prevalent and API design improvements are sorely needed. Last and somewhat surprisingly, a tug of war between providing higher levels of abstractions and the need to understand the behavior of the trained model is prevalent.


page 6

page 9

page 11

page 13


What Kinds of Contracts Do ML APIs Need?

Recent work has shown that Machine Learning (ML) programs are error-pron...

What Causes Exceptions in Machine Learning Applications? Mining Machine Learning-Related Stack Traces on Stack Overflow

Machine learning (ML), including deep learning, has recently gained trem...

Do Not Take It for Granted: Comparing Open-Source Libraries for Software Development Effort Estimation

In the past two decades, several Machine Learning (ML) libraries have be...

Dazed and Confused: What's Wrong with Crypto Libraries?

Recent studies have shown that developers have difficulties in using cry...

Characterizing and Understanding Software Security Vulnerabilities in Machine Learning Libraries

The application of machine learning (ML) libraries has been tremendously...

Studying Logging Practice in Machine Learning-based Applications

Logging is a common practice in traditional software development. Severa...

StackOverflow vs Kaggle: A Study of Developer Discussions About Data Science

Software developers are increasingly required to understand fundamental ...

Please sign up or login with your details

Forgot password? Click here to reset