A Differential Testing Approach for Evaluating Abstract Syntax Tree Mapping Algorithms

02/27/2021
by   Yuanrui Fan, et al.
0

Abstract syntax tree (AST) mapping algorithms are widely used to analyze changes in source code. Despite the foundational role of AST mapping algorithms, little effort has been made to evaluate the accuracy of AST mapping algorithms, i.e., the extent to which an algorihtm captures the evolution of code. We observe that a program element often has only one best-mapped program element. Based on this observation, we propose a hierarchical approach to automatically compare the similarity of mapped statements and tokens by different algorithms. By performing the comparison, we determine if each of the compared algorithms generates inaccurate mappings for a statement or its tokens. We invite 12 external experts to determine if three commonly used AST mapping algorithms generate accurate mappings for a statement and its tokens for 200 statements. Based on the experts' feedback,we observe that our approach achieves a precision of 0.98–1.00 and a recall of 0.65–0.75. Furthermore, we conduct a large-scale study with a dataset of ten Java projects, containing a total of 263,165 file revisions. Our approach determines that GumTree, MTDiff and IJM generate inaccurate mappings for 20 file revisions, respectively. Our experimental results show that state-of-art AST mapping agorithms still need improvements.

READ FULL TEXT
research
03/13/2018

Hierarchical Learning of Cross-Language Mappings through Distributed Vector Representations for Code

Translating a program written in one programming language to another can...
research
11/29/2017

An Abstract Method Linearization for Detecting Source Code Plagiarism in Object-Oriented Environment

Despite the fact that plagiarizing source code is a trivial task for mos...
research
08/09/2023

Evaluating and Optimizing the Effectiveness of Neural Machine Translation in Supporting Code Retrieval Models: A Study on the CAT Benchmark

Neural Machine Translation (NMT) is widely applied in software engineeri...
research
08/06/2018

Executable Trigger-Action Comments

Natural language elements, e.g., todo comments, are frequently used to c...
research
11/04/2019

Learning to Fix Build Errors with Graph2Diff Neural Networks

Professional software developers spend a significant amount of time fixi...
research
09/08/2020

Predicting Defective Lines Using a Model-Agnostic Technique

Defect prediction models are proposed to help a team prioritize source c...
research
04/25/2017

DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning

Computer programs written in one language are often required to be porte...

Please sign up or login with your details

Forgot password? Click here to reset