A systematic literature review on source code similarity measurement and clone detection: techniques, applications, and challenges

by   Morteza Zakeri Nasrabadi, et al.

Measuring and evaluating source code similarity is a fundamental software engineering activity that embraces a broad range of applications, including but not limited to code recommendation, duplicate code, plagiarism, malware, and smell detection. This paper proposes a systematic literature review and meta-analysis on code similarity measurement and evaluation techniques to shed light on the existing approaches and their characteristics in different applications. We initially found over 10000 articles by querying four digital libraries and ended up with 136 primary studies in the field. The studies were classified according to their methodology, programming languages, datasets, tools, and applications. A deep investigation reveals 80 software tools, working with eight different techniques on five application domains. Nearly 49 of the tools work on Java programs and 37 support for many programming languages. A noteworthy point was the existence of 12 datasets related to source code similarity measurement and duplicate codes, of which only eight datasets were publicly accessible. The lack of reliable datasets, empirical evaluations, hybrid methods, and focuses on multi-paradigm languages are the main challenges in the field. Emerging applications of code similarity measurement concentrate on the development phase in addition to the maintenance.


page 11

page 12

page 22

page 23

page 27

page 32

page 33

page 34


GraphBinMatch: Graph-based Similarity Learning for Cross-Language Binary and Source Code Matching

Matching binary to source code and vice versa has various applications i...

Code smells detection and visualization: A systematic literature review

Context: Code smells (CS) tend to compromise software quality and also d...

Measuring source code conciseness across programming languages using compression

It is well-known, and often a topic of heated debates, that programs in ...

Software Vulnerability Prediction Knowledge Transferring Between Programming Languages

Developing automated and smart software vulnerability detection models h...

Code Similarity on High Level Programs

This paper presents a new approach for code similarity on High Level pro...

Black Boxes, White Noise: Similarity Detection for Neural Functions

Similarity, or clone, detection has important applications in copyright ...

A systematic literature review on the code smells datasets and validation mechanisms

The accuracy reported for code smell-detecting tools varies depending on...

Please sign up or login with your details

Forgot password? Click here to reset