A Survey on Machine Learning Techniques for Source Code Analysis

10/18/2021
by   Tushar Sharma, et al.
69

Context: The advancements in machine learning techniques have encouraged researchers to apply these techniques to a myriad of software engineering tasks that use source code analysis such as testing and vulnerabilities detection. A large number of studies poses challenges to the community to understand the current landscape. Objective: We aim to summarize the current knowledge in the area of applied machine learning for source code analysis. Method: We investigate studies belonging to twelve categories of software engineering tasks and corresponding machine learning techniques, tools, and datasets that have been applied to solve them. To do so, we carried out an extensive literature search and identified 364 primary studies published between 2002 and 2021. We summarize our observations and findings with the help of the identified studies. Results: Our findings suggest that the usage of machine learning techniques for source code analysis tasks is consistently increasing. We synthesize commonly used steps and the overall workflow for each task, and summarize the employed machine learning techniques. Additionally, we collate a comprehensive list of available datasets and tools useable in this context. Finally, we summarize the perceived challenges in this area that include availability of standard datasets, reproducibility and replicability, and hardware resources.

READ FULL TEXT

page 3

page 34

page 37

research
08/09/2020

Predictive Models in Software Engineering: Challenges and Opportunities

Predictive models are one of the most important techniques that are wide...
research
12/14/2020

A Software Engineering Perspective on Engineering Machine Learning Systems: State of the Art and Challenges

Context: Advancements in machine learning (ML) lead to a shift from the ...
research
12/23/2020

Crowdsmelling: The use of collective knowledge in code smells detection

Code smells are seen as major source of technical debt and, as such, sho...
research
07/16/2012

MARFCAT: Transitioning to Binary and Larger Data Sets of SATE IV

We present a second iteration of a machine learning approach to static c...
research
07/02/2021

An Experience Report on Machine Learning Reproducibility: Guidance for Practitioners and TensorFlow Model Garden Contributors

Machine learning techniques are becoming a fundamental tool for scientif...
research
04/05/2022

Automatic Image Content Extraction: Operationalizing Machine Learning in Humanistic Photographic Studies of Large Visual Archives

Applying machine learning tools to digitized image archives has a potent...
research
11/22/2021

Machine Learning for Mars Exploration

Risk to human astronauts and interplanetary distance causing slow and li...

Please sign up or login with your details

Forgot password? Click here to reset