A Unified Active Learning Framework for Annotating Graph Data with Application to Software Source Code Performance Prediction

04/06/2023
by   Peter Samoaa, et al.
0

Most machine learning and data analytics applications, including performance engineering in software systems, require a large number of annotations and labelled data, which might not be available in advance. Acquiring annotations often requires significant time, effort, and computational resources, making it challenging. We develop a unified active learning framework, specializing in software performance prediction, to address this task. We begin by parsing the source code to an Abstract Syntax Tree (AST) and augmenting it with data and control flow edges. Then, we convert the tree representation of the source code to a Flow Augmented-AST graph (FA-AST) representation. Based on the graph representation, we construct various graph embeddings (unsupervised and supervised) into a latent space. Given such an embedding, the framework becomes task agnostic since active learning can be performed using any regression method and query strategy suited for regression. Within this framework, we investigate the impact of using different levels of information for active and passive learning, e.g., partially available labels and unlabeled test data. Our approach aims to improve the investment in AI models for different software performance predictions (execution time) based on the structure of the source code. Our real-world experiments reveal that respectable performance can be achieved by querying labels for only a small subset of all the data.

READ FULL TEXT
research
02/03/2018

A deep tree-based model for software defect prediction

Defects are common in software systems and can potentially cause various...
research
05/15/2017

Active Learning for Graph Embedding

Graph embedding provides an efficient solution for graph analysis by con...
research
06/10/2022

StructCoder: Structure-Aware Transformer for Code Generation

There has been a recent surge of interest in automating software enginee...
research
01/10/2019

ALFAA: Active Learning Fingerprint Based Anti-Aliasing for Correcting Developer Identity Errors in Version Control Data

Graphs of developer networks are important for software engineering rese...
research
04/05/2021

Automated Performance Testing Based on Active Deep Learning

Generating tests that can reveal performance issues in large and complex...
research
06/17/2022

Evaluating the Impact of Source Code Parsers on ML4SE Models

As researchers and practitioners apply Machine Learning to increasingly ...
research
05/11/2022

CV4Code: Sourcecode Understanding via Visual Code Representations

We present CV4Code, a compact and effective computer vision method for s...

Please sign up or login with your details

Forgot password? Click here to reset