On Using Machine Learning to Identify Knowledge in API Reference Documentation

07/23/2019
by   Davide Fucci, et al.
0

Using API reference documentation like JavaDoc is an integral part of software development. Previous research introduced a grounded taxonomy that organizes API documentation knowledge in 12 types, including knowledge about the Functionality, Structure, and Quality of an API. We study how well modern text classification approaches can automatically identify documentation containing specific knowledge types. We compared conventional machine learning (k-NN and SVM) and deep learning approaches trained on manually annotated Java and .NET API documentation (n = 5,574). When classifying the knowledge types individually (i.e., multiple binary classifiers) the best AUPRC was up to 87 The deep learning and SVM classifiers seem complementary. For four knowledge types (Concept, Control, Pattern, and Non-Information), SVM clearly outperforms deep learning which, on the other hand, is more accurate for identifying the remaining types. When considering multiple knowledge types at once (i.e., multi-label classification) deep learning outperforms naïve baselines and traditional machine learning achieving a MacroAUC up to 79 classifiers using embeddings pre-trained on generic text corpora and StackOverflow but did not observe significant improvements. Finally, to assess the generalizability of the classifiers, we re-tested them on a different, unseen Python documentation dataset. Classifiers for Functionality, Concept, Purpose, Pattern, and Directive seem to generalize from Java and .NET to Python documentation. The accuracy related to the remaining types seems API-specific. We discuss our results and how they inform the development of tools for supporting developers sharing and accessing API knowledge. Published article: https://doi.org/10.1145/3338906.3338943

READ FULL TEXT
research
07/26/2016

OntoCat: Automatically categorizing knowledge in API Documentation

Most application development happens in the context of complex APIs; ref...
research
12/05/2018

How practical is it? Machine Learning for Identifying Conceptual Interoperability Constraints in API Documents

Building meaningful interoperation with external software units requires...
research
02/16/2021

Automatic Detection of Five API Documentation Smells: Practitioners' Perspectives

The learning and usage of an API is supported by official documentation....
research
07/19/2017

Generic Black-Box End-to-End Attack Against State of the Art API Call Based Malware Classifiers

In this paper, we present a black-box attack against API call based mach...
research
06/17/2021

PyKale: Knowledge-Aware Machine Learning from Multiple Sources in Python

Machine learning is a general-purpose technology holding promises for ma...
research
07/27/2017

Find, Understand, and Extend Development Screencasts on YouTube

A software development screencast is a video that captures the screen of...
research
03/07/2023

ADELT: Transpilation Between Deep Learning Frameworks

We propose Adversarial DEep Learning Transpiler (ADELT) for source-to-so...

Please sign up or login with your details

Forgot password? Click here to reset