The Development and Prospect of Code Clone

02/17/2022
by   Xunhui Zhang, et al.
NetEase, Inc
0

The application of code clone technology accelerates code search, improves code reuse efficiency, and assists in software quality assessment and code vulnerability detection. However, the application of code clones also introduces software quality issues and increases the cost of software maintenance. As an important research field in software engineering, code clone has been extensively explored and studied by researchers, and related studies on various sub-research fields have emerged, including code clone detection, code clone evolution, code clone analysis, etc. However, there lacks a comprehensive exploration of the entire field of code clone, as well as an analysis of the trend of each sub-research field. This paper collects related work of code clones in the past ten years. In summary, the contributions of this paper mainly include: (1) summarize and classify the sub-research fields of code clone, and explore the relative popularity and relation of these sub-research fields; (2) analyze the overall research trend of code clone and each sub-research field; (3) compare and analyze the difference between academy and industry regarding code clone research; (4) construct a network of researchers, and excavate the major contributors in code clone research field; (5) The list of popular conferences and journals was statistically analyzed. The popular research directions in the future include clone visualization, clone management, etc. For the clone detection technique, researchers can optimize the scalability and execution efficiency of the method, targeting particular clone detection tasks and contextual environments, or apply the technology to other related research fields continuously.

READ FULL TEXT VIEW PDF

page 11

page 16

05/03/2020

The Vision of Software Clone Management: Past, Present, and Future

Duplicated code or code clones are a kind of code smell that have both p...
06/15/2021

Code Generation Based on Deep Learning: a Brief Review

Automatic software development has been a research hot spot in the field...
11/04/2020

Opportunities and Challenges in Code Search Tools

Code search is a core software engineering task. Effective code search t...
03/01/2021

Code smells: A Synthetic Narrative Review

Code smells are symptoms of poor design and implementation choices, whic...
06/17/2018

Utilizing Provenance in Reusable Research Objects

Science is conducted collaboratively, often requiring the sharing of kno...
02/01/2021

Search-Based Software Re-Modularization: A Case Study at Adyen

Deciding what constitutes a single module, what classes belong to which ...
05/03/2021

Metaphor Research in the 21st Century: A Bibliographic Analysis

Metaphor is widely used in human communication. The cohort of scholars s...

1. Introduction

Code clone refers to two or more identical or similar source code fragments in a codebase (Chen et al., 2019). As an important research topic in software engineering, code clone enhances software development efficiency aids software quality assessment and software vulnerability discovery. However, code clone is a bad smell that brings defects, which increases software maintenance costs and leads to software quality degradation. It also introduces intellectual property protection issues. Therefore, both industry and academia have paid close attention to the code clone problem. Many studies focus on code clones, including clone detection (Cordy and Roy, 2011; Sajnani et al., 2016), clone evolution (Bazrafshan, 2012; Bouma, 2012), clone visualization (Honda et al., 2019; Murakami et al., 2015), clone refactoring(Baars and Oprescu, 2019; Alwaqfi, 2017), etc. In response to the above research problems, many literature reviews have emerged. However, these reviews mainly focus on sub-research areas such as clone detection (Rattan et al., 2013), clone evolution (Pate et al., 2013), clone visualization (Hammad et al., 2020). However, there is a lack of reviews to scientifically summarize the code clone sub-research areas for the whole code clone field and analyze each sub-research area’s development trend. This paper collects and organizes the work in code clone in the past ten years to address the problem. We extract the basic information of the articles, including the date of publication, title, abstract, keywords, author and affiliation information, etc. Then we classify the articles into sub-research areas by card sorting and finally analyze the development of each sub-research area of code clone by statistical methods. We also construct an author cooperation network from the perspective of researchers, explore the research pattern of code cloning, find the researchers who have made outstanding research contributions to the research of code clone and their affiliations, and analyze the difference of researcher cooperation among countries. The main contributions of this paper are as follows:

  1. We collected 1,294 papers related to code clone in the last decade (from 2011 to 2020);

  2. A manual classification of code clone related sub-research areas was conducted, and a public dataset was formed, containing information on topics and related articles of sub-research areas;

  3. The hotness analysis of code cloning sub-research areas was conducted to explore the changes of code clone as a whole and each sub-research area in the past ten years;

  4. We compared and analyzed the changes of industrial and academic interest in each sub-research area of code clone and the overall interest in the sub-research area;

  5. We constructed a network of collaborative relationships among code clone researchers, analyzed the characteristics of collaborative relationships, identified outstanding contributors and popular research institutions, and explored the differences of collaborative relationships in different countries.

Section 2 of this paper introduces the research background, summarizes the systematic literature review articles related to code cloning and the differences between this paper and related works, and presents the research questions of this paper. Section 3 introduces collecting, screening, and data extraction of clone-related articles based on a systematic literature review. Section 4 describes the classification method of code clone sub-research areas based on card sorting. Section 5 presents the results of classification of code clone sub-research areas, trends of each category, the attention of industry and academia, analysis of author collaboration networks, and analysis of popular journals and conferences. Section 6 discusses the paper’s conclusions and gives an outlook for future research in the field of code cloning. Finally, Section 7 concludes the paper.

2. Background

Although there have been many surveys on code cloning, most have focused on a sub-research area. Table 1 summarizes the systematic literature review related to code cloning, comparing several aspects, including the time of publication, the number of articles included, and the year covered by the article. The “basic information” and the “research point”, which refers to the issue in the field of code cloning that the article focuses on, are all part of “code clone detection”. For example, Ain et al. (Ain et al., 2019a) focus on code clone detection methods, the related data, and intermediate data representations involved in the methods. Although there are related works that focus on sub-research area classification and popularity analysis (Auch et al., 2020; Rattan et al., 2013; Zibran and Roy, 2012; Bharti and Singh, 2020a; Shippey et al., 2012; Su and Zhang, 2018), these works either focus on a specific sub-research area, e.g., software similarity (Auch et al., 2020), or the research area has broadened. The popularity may change after a long time of development (Rattan et al., 2013; Zibran and Roy, 2012; Shippey et al., 2012; Su and Zhang, 2018), Or there is a lack of analysis of the development trend of sub-research areas (Bharti and Singh, 2020a). At the same time, we found that the classification of code cloning in the relevant studies was based on research experience. The classification results varied widely, lacking the classification of code cloning sub-research areas based on scientific research methods. Finally, all the related studies lacked the analysis of the popularity of attention to code cloning in practice.

Item (Auch et al., 2020) (Pate et al., 2013) (Rattan et al., 2013) (Ali and Sulaiman, 2014) (Ain et al., 2019a) (Wang et al., 2017) (Rattan and Kaur, 2016) (Patil et al., 2014) (Svajlenko and Roy, 2020) (Chochlov, 2017) (Paiva and Figueiredo, ) (Hammad et al., 2020) (Zibran and Roy, 2012) (Wahlroos and others, 2019) (Bharti and Singh, 2020a) (Shippey et al., 2012) (Ain et al., 2019b) (Mondal et al., 2020) (Chatterji, 2014) (Su and Zhang, 2018)
Date 2020 2013 2013 2014 2019 2017 2016 2014 2020 2017 2014 2020 2012 2019 2020 2012 2019 2020 2014 2018
Paper num 136 30 213 7 54 61 30 20 198 177 65 68 262 32 27 220 54 97 39 575
Time range 2002-2019 -2011 1997-2011 -2012 2013-2018 1997-2016 1996-2015 2010-2014 2011-2017 2011- -2014 -2020 1994-2011 2001-2018 1998-2017 2007-2011 2013-2018 1998-2017 -2014 -2013
Sub-research area
Data
Clone method
Code representation
Incremental detection
Granularity
Tool
Open source tool
Tool dependency
Clone type
Program language
Tool implementation
IDE
Tool build metrics
Evaluation metrics
Similarity measurement
Clone mapping
Evolution pattern
Genealogy extraction
GUI support
Visualization
Refactor pattern
means the paper focused on the research point.
Table 1. Collection of basic information and focus of code clone related systematic literature reviews

For these reasons, the following research questions are posed in this paper:

RQ1: What sub-research areas exist for code cloning?

Code cloning is a popular research area that has received much attention from industrial and academic researchers, but the classification of code cloning sub-research areas is lacking based on scientific research methods and related research topics. We use card sorting to classify and summarize nearly ten years of research on code cloning, obtain sub-research areas and related topics, and make the results of article classification available in the form of an open dataset. The solution of this problem forms a summary of the work in code cloning and facilitates the advancement of subsequent research work.

RQ2: What is the development trend of code cloning in general and in each sub-research area?

The research in code cloning has evolved over a long period, covering various aspects of research, including improvement of detection techniques, maintenance of software quality, and improvement of software development efficiency. Exploring the development trend of code cloning as a whole and each sub-research area can help subsequent researchers to grasp the current status and future development trend of each area and then carry out subsequent research in a targeted manner.

RQ3: What is the difference between industry and academic interest in code cloning?

Code cloning, as a software engineering practice-related issue, has been widely concerned by the industry. Understanding the industry’s attention and the change can help subsequent researchers understand the development in the field of practice and clarify the code cloning issues in software development to form a close combination of theory and practice.

RQ4: In what form do authors collaborate on research?

From the researcher’s point of view, building a collaborative network exploring the active researchers, research teams, and collaboration patterns can help subsequent researchers follow the relevant research work and build a collaborative relationship.

RQ5: Which journals and conferences do code clone papers tend to be published in?

From the researcher’s perspective, the analysis of popular conferences and journals can help subsequent researchers to submit papers in a targeted way. At the same time, they can participate in and follow the work of related research conferences and journals to accelerate the research process.

3. Systematic literature review

This paper follows the standardized steps of systematic literature review in software engineering (Keele and others, 2007), consisting of five main steps: online search, paper selection, recursive snowballing, quality assessment, and data extraction (Figure 1).

Figure 1. Pipeline of systematic literature review

3.1. Online search

3.1.1. Search method selection

The current online search methods for systematic literature reviews are divided into two main types: keyword-based search (Keele and others, 2007) and target venue-based search (de Paulo Sobrinho et al., 2018). Both search strategies have their drawbacks. The keyword-based search relies mainly on keywords and search engine selection, where the formation of search terms depends on the research questions and authors’ experience (Pate et al., 2013; Bandi et al., 2013; Zhang et al., 2011). This may result in the omission of keyword synonyms, and the selection of search databases and publishers may also omit some relevant articles. For the venue-based search, the selection of targets is crucial, and the absence of target conferences and journals will lead to the loss of a large amount of relevant literature. Although the recursive snowballing method can help reduce the number of missing articles (Jalali and Wohlin, 2012), it still cannot guarantee the integrity of papers.

Because of the above factors, we finally chose the keyword-based search approach for the following reasons:

  1. Sobrinho et al. (de Paulo Sobrinho et al., 2018) used the venue-based search method, but in the recursive snowballing phase, their newly discovered 85 articles appeared in 60 uncovered venues;

  2. From WikiCFP 111http://www.wikicfp.com/cfp/, we found that there are thousands of journals and conferences under the computer science category, so it is not easy to select target venues to achieve full coverage of papers;

  3. We can optimize our search strategy by combining the keywords defined by the authors in the related work and the selected search database for the keyword-based search method.

3.1.2. Define search string

In order to avoid the loss of search keywords, we collected some search keywords from related work, and the results are shown in Table 2. We can divide the search keywords into two parts:

  1. Subject: ‘code’, ‘software’, ‘application’

  2. Action: ‘clone’, ‘cloning’, ‘copy’, ‘duplicate’, ‘duplication’, ‘similarity’, ‘sibling’

Therefore, we combine the keywords in the subject and action and connect them with OR logic to form the final search string as shown below:

“code clone” OR “code cloning” OR “code copy” OR “code duplicate” OR “code duplication” OR “code similarity” OR “code sibling” OR “software clone” OR “software cloning” OR “software copy” OR “software duplicate” OR “software duplication” OR “software similarity” OR “software sibling” OR “application clone” OR “application cloning” OR “application copy” OR “application duplicate” OR “application duplication” OR “application similarity” OR “application sibling”

Paper Search keywords or strings
(Rattan et al., 2013) clone; software; code
(Ali and Sulaiman, 2014) code clone; code clone tools; code clone prevention; code clone prevention mechanism; code clone management
(Pate et al., 2013) ((‘code’ OR ‘software’ OR ‘application’) AND (‘clone’ OR ‘cloning’ OR ‘copy’ OR ‘duplicate’ OR ‘duplication’ OR ‘similarity’) AND (‘change’ OR ‘evolution’ OR ‘genealogy’ OR ‘maintenance’ OR ‘management’ OR ‘tracking’))
(Hordijk et al., 2009) code; software; clone; clones; duplication
(Ain et al., 2019b)

code clone detection; text-based techniques; tree-based techniques; metric based techniques; PDG based techniques; hybrid techniques; machine learning techniques

(Patil et al., 2014) ((‘code cloning’ OR ‘software code cloning’ OR ‘code cloning tool’) AND (‘code copy’ OR ‘duplicate code’ OR ‘duplication’ OR ‘code similarity’))
(de Paulo Sobrinho et al., 2018) code siblings; copy-and-paste; duplicate code; near-miss clones
Table 2. Search keywords or string in related papers

3.1.3. Select search database

By combining the selection of search databases and engines in related works (Rattan et al., 2013; Keele and others, 2007; Bandi et al., 2013; Baqais and Alshayeb, 2020), the databases searched in this paper include ACM, IEEExplore, Scopus of Elsevier, Springer, Web of Science, Google Scholar, and Wiley Online Library.

3.1.4. Search tools and methods

To quickly and accurately retrieve different online resources, we use various tools. For ACM, Springer, and Web of Science, we used the open source Chrome plugin lit-automation/chrome-plugin from GitHub.222https://github.com/lit-automation/chrome-plugin For Google Scholar, we used Publish or Perish (Harzing, 2010). For IEEExplore, Wiley Online Library, and Scopus, we manually searched the query strings in the official websites.

Google Scholar, IEEExplore, and Scopus have a limit on the maximum search string length, so we divide the search string into two separate parts as follows:

  1. Substring 1: “code clone” OR “code cloning” OR “code copy” OR “code duplicate” OR “code duplication” OR “code similarity” OR “code sibling” OR “software clone” OR “software cloning” OR “software copy” OR “software duplicate” OR “software duplication.”

  2. Substring 2: “software similarity” OR “software sibling” OR “application clone ” OR “application cloning” OR “application copy” OR “application duplicate” OR “application duplication” OR “application similarity” OR “application sibling.”

For the Google Scholar search, the maximum number of results returned is 1000 at a time (Harzing, 2012; Gusenbauer, 2019). In order to cover all relevant articles, we split the search process by year to ensure that each search yields less than 1000 results. (E.g., when searching for Google Scholar results in 2020, the results returned exceeded 1000, so we split substring 1 into 2 strings)

3.1.5. Search results

Through the above search strategy, we collected 35,945 papers (search date: 2020-09-13), of which Google Scholar (28,803), ACM (1,978), IEEExplore (644), Web of Science (741), Scopus (1,730), Springer (1,440), Wiley Online Library (609).

3.2. Paper selection

This section sets the inclusion and exclusion rules for paper selection and then filters the papers by manual screening. Here we set up the rules as follows:

  1. De-duplication. Different search engines and search databases may retrieve the same article. Here we remove duplicate papers according to the title (7,901 articles were removed);

  2. Removal of irrelevant articles. We removed articles that were not related to code cloning by manual checking based on title, abstract, keywords, and full text (24,982 articles were removed);

  3. Removal of non-Chinese or non-English articles (523 articles were removed);

  4. Removal of non-academic research papers. We only retained academic research and removed work such as patents, conference presentations, slides, etc. (1,101 articles were removed);

  5. Removal of work for which the full text could not be found. For subsequent data extraction, we removed work for which the full text could not be found (51 articles were removed);

  6. Paper between 2011 and 2020 was retained. In this paper, we only investigate the development of code cloning in the last decade (346 articles were removed).

The above steps resulted in a total of 1,041 remaining papers.

3.3. Recursive snowballing

Since keyword-based search methods have the potential problem of missing synonyms when constructing search strings, to solve this problem, we use the recursive snowballing method to retrieve missing papers in the references of related papers (Jalali and Wohlin, 2012). Compared with the non-recursive snowballing method (de Paulo Sobrinho et al., 2018), this method can obtain more relevant articles and thus circumvent the problem of missing papers. Each stage of recursive snowballing consists of two main steps:

  1. Automatic reference extraction. Here we use the CERMINE (Tkaczyk et al., 2015) to extract the reference list of articles automatically;

  2. Determining whether the articles are relevant according to the paper selection rules defined in Section3.2.

Based on the above steps, no new relevant articles appear after 5 iterations. The paper acquisition and filtering for each iteration are shown in Table 3.

Stage
1 2 3 4 5 Total
# extracted papers(-) 21797 4461 535 41 6 26840
# duplications(-) 15289 3100 339 26 5 18759
# unrelated papers(-) 5217 1112 166 7 0 6502
# non-Chinese or non-English papers(-) 20 1 0 0 0 21
# non-research papers(-) 45 10 0 0 0 55
# papers without full text(-) 74 9 1 0 0 84
# papers published before 2011(-) 928 203 27 7 1 1166
# new related papers 224 26 2 1 0 253
Note Number indicates the number of papers
(-) indicates the operation of filtering papers
Table 3. The selection of papers for each stage during recursive snowballing

After recursive snowballing, we got 253 new articles. We collected 1294 articles related to code cloning by combining the previous articles.

3.4. Data extraction

To assist the subsequent paper topic extraction, sub-research area classification, popularity analysis, industrial-academic research analysis, and author collaboration network analysis, we extracted paper and author information, respectively, and the extracted data included the following:

  1. Paper information: venue name, publication date, title, abstract, and keywords;

  2. Author information: author’s name, affiliation, country, email addresses, and order.

4. Sub-research area classification method

Paper Sub-research areas
(Rattan et al., 2013) clone evolution; clone analysis; impact of clones on software quality; clone detection in websites; cloning in related areas; clone detection in aspect oriented programming
(Zibran and Roy, 2012) clone analysis; clone detection; clone management; clone detection tool evaluation
(Bharti and Singh, 2020a) clone management; clone detection; clone visualization; clone refactoring; clone tracking; linked clone editing; software quality control
(Shippey et al., 2012) clone detection; clone management; clone evolution; clone removal; clone defects; clone visualization; evaluation of clone detection; clone taxonomies; multiple version of clone; plagiarism; copyright infringement; product lines; aspect mining; quality analysis; clone merging; origin analysis; program understanding; other
(Su and Zhang, 2018) clone detection; clone analysis; clone maintenance and management; survey and tool evaluation
Table 4. Classification of sub research areas of code clone in related work

Related work has yielded different classification results for studies in the code cloning sub-research area, as shown in Table 4. To address the inconsistency in classification, this paper uses a scientific classification method for software engineering, “card sorting” (Spencer, 2009; Zimmermann, 2016), based on the synthesis of a more comprehensive collection of code clone related studies from the last decade to classify the code clone sub-research areas. The topics included and the paper correspondence is published as datasets to the GitLink platform. 333https://www.gitlink.org.cn/Nigel/jos_code_clone_trend_future/ In this paper, the classification of code clone sub-research areas based on the card sorting method consists of the following steps:

  1. Preparation: The first four authors shared the card sorting series by first adding cards to each relevant article based on its title, keywords, abstract information, and even full text. The information on the cards can be summarized or extracted according to the developer’s experience or relevant information. The card information includes two parts: the English name of the topic and the Chinese description information (to facilitate the understanding of the topic and speed up the merging of new articles).

  2. Card classification: There are two main methods of card classification. One checks the classification consistency after separate parallel coding; the other is single-threaded common classification (Begel and Zimmermann, 2014). We adopted the latter method to discuss and agree on disagreements during the classification process (Campbell et al., 2013), which can quickly construct domain knowledge, update cognition and form consistent conclusions on time. We used the open coding approach (OCA) (Zimmermann, 2016), which generates sub-research domains during the sorting process. The authors integrate topics that focus on the same research area or similar research questions according to the topic description in the cards. E.g., the topic “license violation detection” is described as “Detection of open source license violations based on clone detection”; “bug localization” is described as “locating bugs in code using code cloning techniques”. These topics are applications of clone detection techniques in other fields, so they were merged and replaced by the card “other fields based on clone detection technique.” The final integration resulted in all the first-level cards, all the sub-study fields, and all the initial cards, which are the topics included in that sub-study field).

5. Result

This section presents the results to answer the five research questions posed in Section 2.

5.1. RQ1: What sub-research areas exist for code cloning?

5.1.1. Popularity analysis of sub-research areas

We obtained the classification results with the card sorting method, as shown in Table 5. According to Bharti et al. (Bharti and Singh, 2020b), clone management covers various fields such as clone retrieval, clone visualization, clone refactoring, clone detection, etc. Therefore, in this paper, only papers related to the proposed clone management framework are integrated under the clone management category while classifying. To facilitate the subsequent research, we set up a separate category of clone survey, which intersects with other categories, i.e., papers related to clone survey belong to other sub-research areas.

From Table 5, we can see that clone detection has been the most popular sub-research area in the past 10 years. It is the basis for other research areas, including clone analysis, clone evolution, etc. Although clone detection relies on many validation datasets and methods, there are few relevant articles in the two research areas of clone datasets and evaluation. Many clone detection methods rely on the same large-scale datasets and validation algorithms. Based on clone detection methods, many analytical papers and two subfields of clone evolution and clone refactoring have been derived, mainly analyzing and addressing software quality and health. At the same time, we can find that clone detection techniques penetrate many other software-related fields, with 90 research works focusing on other research areas based on clone detection techniques.

Category # papers Paper description
clone detection technique 749 Related papers propose new clone detection methods.
clone analysis 183 Related papers have conducted empirical studies based on clone detection methods or targeting the field of code cloning (does not include analytical articles dedicated to other sub-research areas).
clone evolution 143 Related papers focus on the changes in code cloning during the software lifecycle.
clone refactoring 110 Related papers focus on the refactoring of code clones.
other fields based on clone detection technique 90 Related papers use clone detection technology to solve problems in other areas.
survey and tutorial 87 Related papers summarize and analyze relevant articles or technical reports from previous fields.
clone visualization 52 The main contribution of the related papers is the proposed visualization method or tool for code cloning.
clone evaluation 28 Related papers describe metrics, tools, methods, etc., for evaluating clone detection techniques.
benchmark 21 Related papers construct or evaluate datasets on code clone detection.
clone management 14 Related papers focus on the macro-level of code clone management (e.g., proposing frameworks, mechanisms, concepts, etc.)
Table 5. Classification of sub research areas of code clone

5.1.2. Correlation analysis of sub-research areas

In our classification, a paper belongs to more than one category, and Figure 2 shows the intersection of papers related to different sub-research areas. For the sake of presentation, only sets containing at least 2 articles are included in the intersection, and the full image has been uploaded to GitLink. 444https://www.gitlink.org.cn/Nigel/jos_code_clone_trend_future/tree/master/upsetplot.png The figure shows that the articles related to code clone research mainly focus on the summary of clone detection techniques (65 out of 87 articles are related to the discussion of clone detection methods). In contrast, for other sub-research areas, except for “other fields based on cloning detection technique”, all the others have surveys or tutorials. This paper analyzes the relevant surveys in Section 5.1.3 and summarizes the “other fields based on cloning detection technique” in detail. In addition, we found that clone evolution has a strong correlation with clone visualization, clone analysis, and clone refactoring, respectively. More than 10 related papers indicate that visualization support based on clone evolution and software quality-oriented clone analysis and refactoring are the main research directions of clone evolution.

Figure 2. Intersection plot of different sub research fields

5.1.3. Analysis of related themes in sub-research areas

In card sorting, we set the topics to which the related research belonged. We then integrated the topics to finally form the correspondence between the sub-research areas and their topics (see the description of the card sorting steps in Section 4 for the specific process). Figure 3 shows the related research topics of “other fields based on clone detection technique”, which we present in detail due to the lack of detailed analysis of this area in the related research work. The topic association diagram of all research areas has been uploaded to GitLink in a tree diagram). 555https://www.gitlink.org.cn/Nigel/jos_code_clone_trend_future/tree/master/class_topic_relation_tree.html

Figure 3. Related topics of other fields based on clone detection technique

We found that all the 90 articles in “other fields based on clone detection technique” were classified into 41 topics, including code search (19 related papers), malware detection (13 related papers), and vulnerability detection (10 related papers), which are the three most popular topics. Due to the characteristics of code clone detection, in software engineering, code clone itself can be used as a class of methods to assist in code reuse and software quality management by calculating the similarity between codes. Therefore, code cloning itself has advantages and disadvantages. However, it can increase code maintenance costs (Monden et al., 2002), lead to the propagation of software vulnerabilities (Lozano et al., 2007), reduce code readability (Koschke, 2008), etc. However, at the same time, code clone techniques can also improve the efficiency of software reuse (Keivanloo et al., 2012), speed up software development (Tonscheidt, 2015), detect software vulnerabilities (Li et al., 2016), and predict code errors (Kamei et al., 2011), etc.

The statistics and analysis of topics contained in sub-research areas can help subsequent researchers quickly build domain knowledge.

5.2. RQ2: What is the development trend of code cloning in general and in each sub-research area?

We analyzed the development trend of clone detection as a whole and each sub-research area in chronological order. The results are shown in Figure 4 and Figure 5 (see GitLink 666https://www.gitlink.org.cn/Nigel/jos_code_clone_trend_future/tree/master/overall_hotness_change_trend.html777https://www.gitlink.org.cn/Nigel/jos_code_clone_trend_future/tree/master/sub-fields_hotness_change_trend.html

for the HTML code of the images). From the overall change of popularity, code cloning experienced a rapid development from 2011 to 2012 and peaked. The subsequent research popularity showed a zigzag development trend. Since the search for related papers was conducted in September 2020, we do not have the complete data for 2020, so we do not consider the popularity of articles related to code cloning in 2020 for the time being. From the sub-research areas, we find that the popularity of clone detection is similar to the overall trend, which is probably because the articles related to clone detection account for a considerable proportion of all articles.

Figure 4. The trend of the overall perspective of code clone
Figure 5. The trend of sub-research fields of code clone

Compared with other sub-research areas, we found that the number of related articles, including clone analysis, clone evolution, and clone refactoring, decreased significantly in 2015 (the opposite trend to clone detection). However, the enthusiasm increased to a certain extent in the next one or two years, probably because the clone detection method is the basis of other sub-research areas. The new clone detection method in 2015 has facilitated the subsequent development of other sub-research areas of code cloning. However, from the overall trend, these three sub-research areas are similar to clone detection, showing an overall decreasing trend.

From the overall trend, the three sub-research areas of clone visualization, clone dataset, and clone management have maintained a stable or rising trend in the past one or two years and have not declined. We think the possible reasons are as follows:

  1. The dataset is the basis for the formation and breakthrough of the clone detection method, and its development will be a breakthrough due to the bottleneck of the clone detection method. The proposed new dataset means that the clone detection method will have significant progress in some program languages or some aspects;

  2. The proposed clone visualization and clone management framework are clone detection methods in software maintenance and management applications. The development of related work has a certain lag compared with clone detection methods.

5.3. RQ3: What is the difference between industry and academic interest in code cloning?

We categorized the papers that included authors affiliated with industry as articles of interest to industry and those that included only academic research institutions as articles of interest to academia.

We found that 112 (8.66%) of the papers related to code cloning in the past 10 years involved the industry, indicating that the industry has some interest in the field of code cloning. We compared the percentage of relevant articles in each sub-research area (Figure 7). The figure shows that the industry pays more attention to the sub-research areas of clone management, clone evaluation, clone refactoring, and clone analysis (with a higher percentage of articles). In contrast, the attention to the area of clone detection is relatively low. This shows that the industry is more concerned with the impact of code cloning on software quality and how to refactor and manage code clones.

Comparing the change in the research intensity of code cloning between industry and academia over time (Figure 7), we find that industry (2012) reached the peak of research intensity sooner than academia (2014). The industry has shown a warming trend for code cloning research in the last 2 to 3 years, while academia has a downward trend in the overall research intensity. The possible reason for this phenomenon is that the problem of code cloning was discovered by industry from a practical problem. Then academia paid attention to the problem and continued to propose new ideas, methods, and results, which were adopted and applied in industry. Therefore, the industry continues to pay attention to the problems related to code cloning.

Figure 6. Comparison between industry and academic research hotness of code clone
Figure 7. Change of number of industry and academic research papers overtime

5.4. RQ4: In what form do authors collaborate on research?

Figure 8 shows the collaboration network of authors in code cloning. Nodes indicate authors. Edges indicate that two authors have collaborated in at least one paper. Node size is positively correlated with the number of articles the author has participated in. The size of the author’s name corresponding to the node is positively correlated with the node size. The color indicates the country and region to which the author belongs. The same color indicates the same country or region, and the color depth of the edge is positively correlated with the number of papers co-authored by the related author.

Figure 8. Co-author network of code clone research field

It is found that 72.4% (1,568) of the authors have only contributed to one article, 4.9% (106) of the authors have contributed to five or more articles related to code cloning, and the most involved author is Chanchal K. Roy (the largest node as shown in Figure 8). He has contributed to 95 articles related to code cloning. Thus, although the field of code cloning is a popular research area with a large number of researchers involved in related subfields, only a small number of researchers are focused on this area and continue to contribute research results. We analyzed the characteristics of the authors’ cooperative network and found that the average clustering coefficient () of the whole network was 0.913 (Latapy, 2008), and the average path length () was 4.906 (Brandes, 2001). Combined with the method proposed by Telesford et al. (Telesford et al., 2011) for identifying small-world complex networks, we generated an equivalent random-derived network based on the connectivity probability using Gephi (Bastian et al., 2009), we obtained the average clustering coefficient () of this random network as 0.001 and the average path length () as 6.332. On this basis, we calculated the small-world coefficients of the authors’ cooperative network ((Humphries et al., 2006; Humphries and Gurney, 2008). In summary, the cooperative network of code cloning authors satisfies the characteristics of a small-world network, i.e., . From the cooperative network of code cloning authors, it can be seen that the research related to code cloning exhibits the characteristics of a “small-world network,” and the authors present the phenomenon of clustering. In addition, the frequency of authors’ research is characterized by “high intensity but low persistence,” i.e., although a large number of researchers are involved in research related to code cloning, only a few of them are continuously interested in the field.

In terms of the countries the research institutions belonged to, the top countries were China, India, the United States, Canada, and Japan. We counted the node size and graph density of the author partnership network for each country separately (see Table 6). In terms of the authors’ paper output (node size), the teams from the University of Saskatchewan (Chanchal K. Roy, Kevin A. Schneider) in Canada, and Osaka University (Osaka University), Nanzan (representative researchers: Katsuro Inoue, Shinji Kusumoto, Yoshiki Higo, Masami Noro) in Japan, and the joint team of Osaka University and Nanzan University (representative researchers: Katsuro Inoue, Shinji Kusumoto, Yoshiki Higo, Masami Noro) in Japan, have continued to pay attention and made many contributions in the field of code cloning. In comparison, the graph density of the respective subgraphs of China, India, and the United States is relatively low, indicating that the collaboration among authors is not strong. However, the average node degree of our subgraphs is higher than that of India and the United States, indicating that there are some high-producing teams or active authors in China, which increases the overall average collaboration degree.

China India The United States Canada Japan
The percentage of authors(%) 24.46 15.64 10.66 6.32 6.05
Average node degree 3.694 1.917 2.476 3.956 4.137
Subgraph density 0.007 0.006 0.011 0.029 0.032
Table 6. Proportion of authors from research institutions of different countries and analysis of co-author networks

To help future researchers better track the popular developers and teams in the relevant sub-research areas, we counted the relevant researchers and the research teams belonging to each sub-research area separately according to the number of published articles (Table 7).

Sub-research areas Popular researchers[affiliated teams](# publications)
clone detection Chanchal K. Roy[University of Saskatchewan](25);
Yoshiki Higo[Osaka University](13);
Oscar Karnalim[University of Newcastle, Maranatha Christian University](12)
clone evolution Chanchal K. Roy[University of Saskatchewan](29);
Kevin A. Schneider[University of Saskatchewan](25);
Manishankar Mondal[University of Saskatchewan](22)
clone refactoring Katsuro Inoue[Osaka University](8);
Chanchal K. Roy[University of Saskatchewan](7);
Masami Noro[Nanzan University](6)
other fields based on clone detection technique Mohammad Reza Farhadi[Concordia University](5);
James R. Cordy[Queen’s University](5);
Katsuro Inoue[Osaka University](5)
survey and tutorial Ritu Garg[Indira Gandhi Delhi Technical University for Women](4);
Chanchal K. Roy[University of Saskatchewan](3);
Dhavleesh Rattan[Punjabi University](2)
clone visualization Chanchal K. Roy[University of Saskatchewan](7);
Kevin A. Schneider[University of Saskatchewan](6);
Katsuro Inoue[Osaka University](5)
clone evaluation Jeffrey Savjlenko[University of Saskatchewan](12);
Chanchal K. Roy[University of Saskatchewan](11);
Matthew Stephan[Queen’s University, Miami University](4)
benchmark Jeffrey Savjlenko[University of Saskatchewan](3);
Alan Charpentier[University of Bordeaux](3);
Chanchal K. Roy[University of Saskatchewan](3)
clone management Minhaz F. Zibran[University of Saskatchewan, University of New Orleans](3);
Hamid Abdul Basit[Lahore University of Management Sciences](2);
Jan Harder[University of Bremen](2)
Table 7. Popular researchers and research teams in various sub-research areas

From the results, it can be seen that there are differences in the popular researchers and teams in different sub-research areas. However, we found that researchers from the University of Saskatchewan team and Osaka University team actively research several sub-research areas.

5.5. RQ5: Which journals and conferences do code clone papers tend to be published in?

Figure 9 shows the list of the top 10 most popular journals and conferences, including the number of relevant articles published, the percentage of articles, and the names of the corresponding journals and conferences. From the results, we can see that many articles come from the International Workshop on Software Clones, a workshop under the ICSME (International Conference on Software Maintenance and Evolution), which has been held for 15 sessions until 2021. Future researchers may consider this workshop when submitting papers or tracking related conferences.

Figure 9. The submission distribution of popular journals and conferences

6. Discussion

From the popularity analysis of code detection results, the overall research in this area has a decreasing trend. However, there is an increasing trend for clone visualization, clone management, and related research analysis. At the same time, we found that the industry has gradually increased its attention to code clone-related research since 2017. More attention has been paid to clone management, evaluation, and refactoring than academic sessions. Future researchers can focus on the clone visualization, enrich the relevant tools, improve the relevant management system, and achieve the transformation results.

Code clone detection has been the most popular sub-research area of code cloning as a supporting technology. For this area of research, the optimization directions include accuracy, scalability, and execution efficiency (Sajnani, 2016). Regarding accuracy, we found that existing clone detection methods are very effective for detecting various types of clones (Wu et al., 2020). For this reason, many developers started to analyze the performance of existing clone detection algorithms on particular clone problems (large gap (Wang et al., 2018)

, large variance clones 

(Nakagawa et al., 2021)) and give corresponding solutions. At the same time, a large amount of related work started to focus on optimizing the execution efficiency of clone detection methods (Li et al., 2020). Many studies focus on clone detection and analysis of unique codes, e.g., smart contracts (Liu et al., 2019). Future researchers who wish to optimize existing clone detection algorithms can start from execution efficiency and scalability or consider optimizing clone detection techniques in special contexts.

As a key technology, clone detection plays an important role in various fields such as code recommendation, malicious code detection, and software quality assessment. Although there is a decreasing trend of research on “other fields based on clone detection technique”, we believe that a large amount of research on code analysis, software reuse, etc., can use clone detection as a problem-solving method or optimization technique.

7. Conclusion

As an important research area of software engineering, code cloning has received much attention from researchers. Many related studies have been conducted to explore various sub-research areas of code cloning. However, there is a lack of comprehensive introduction and popularity analysis of each sub-research area. This paper collected 1,294 research papers on code cloning in the past 10 years according to the detailed steps of systematic literature review, and divided the research on code cloning into 10 sub-research areas by card sorting, and did the following analysis:

  1. We explored the overall popularity of sub-research areas and the intersection of related research areas and uncovered the key research directions and topics of sub-research areas;

  2. We analyzed the change of the overall popularity of code cloning and each sub-research area over time and found the decline of the overall popularity and the difference of the development trend of each sub-research area;

  3. We compared and analyzed the difference of the attention of industry and academia to the sub-research area of code cloning and the change of the attention over time, and found that the industry’s tendencies towards clone management and software quality maintenance;

  4. We constructed a network of author collaborations, explored the “small world” and “high intensity but low persistence” characteristics of code cloning research, discovered the popular researchers and research teams, and analyzed the research institutions in different countries;

  5. We statistically analyzed the popular conferences and journals, which are helpful for the follow-up researchers to submit papers and follow up related research.

This paper discloses the collected papers and classification results, which, together with the results of this paper, can help subsequent researchers quickly build domain knowledge, understand related work, and track research hotspots as well as popular researchers, teams, conferences, and journals. This paper analyzes the research field of code cloning from a macroscopic perspective. Future research work can be guided by the findings and data of this paper to explore each sub-research area more deeply.

References

  • Q. U. Ain, W. H. Butt, M. W. Anwar, F. Azam, and B. Maqbool (2019a) Recent advancements in code clone detection–techniques and tools. IEEE Access 7. Cited by: Table 1, §2.
  • Q. U. Ain, W. H. Butt, M. W. Anwar, F. Azam, and B. Maqbool (2019b) A systematic review on code clone detection. IEEE access 7, pp. 86121–86144. Cited by: Table 1, Table 2.
  • A. M. Ali and S. Sulaiman (2014) A systematic literature review of code clone prevention approaches. International Journal of Software Engineering and Technology 1 (1). Cited by: Table 1, Table 2.
  • A. Alwaqfi (2017) A refactoring technique for large groups of software clones. Ph.D. Thesis, Concordia University. Cited by: §1.
  • M. Auch, M. Weber, P. Mandl, and C. Wolff (2020) Similarity-based analyses on software applications: a systematic literature review. Journal of Systems and Software 168, pp. 110669. Cited by: Table 1, §2.
  • S. Baars and A. Oprescu (2019) Towards automated refactoring of code clones in object-oriented programming languages. Technical report EasyChair. Cited by: §1.
  • A. Bandi, B. J. Williams, and E. B. Allen (2013) Empirical evidence of code decay: a systematic mapping study. In 2013 20th Working Conference on Reverse Engineering (WCRE), pp. 341–350. Cited by: §3.1.1, §3.1.3.
  • A. A. B. Baqais and M. Alshayeb (2020) Automatic software refactoring: a systematic literature review. Software Quality Journal 28 (2), pp. 459–502. Cited by: §3.1.3.
  • M. Bastian, S. Heymann, and M. Jacomy (2009) Gephi: an open source software for exploring and manipulating networks. In Proceedings of the international AAAI conference on web and social media, Vol. 3, pp. 361–362. Cited by: §5.4.
  • S. Bazrafshan (2012) Evolution of near-miss clones. In 2012 IEEE 12th International Working Conference on Source Code Analysis and Manipulation, pp. 74–83. Cited by: §1.
  • A. Begel and T. Zimmermann (2014) Analyze this! 145 questions for data scientists in software engineering. In Proceedings of the 36th International Conference on Software Engineering, pp. 12–23. Cited by: item 2.
  • S. Bharti and H. Singh (2020a) Proactively managing clones inside an ide: a systematic literature review. International Journal of Computers and Applications, pp. 1–20. Cited by: Table 1, §2, Table 4.
  • S. Bharti and H. Singh (2020b) Proactively managing clones inside an ide: a systematic literature review. International Journal of Computers and Applications, pp. 1–20. Cited by: §5.1.1.
  • G. Bouma (2012) Studying the effects of code clone size on clone evolution. Cited by: §1.
  • U. Brandes (2001) A faster algorithm for betweenness centrality. Journal of mathematical sociology 25 (2), pp. 163–177. Cited by: §5.4.
  • J. L. Campbell, C. Quincy, J. Osserman, and O. K. Pedersen (2013) Coding in-depth semistructured interviews: problems of unitization and intercoder reliability and agreement. Sociological methods & research 42 (3), pp. 294–320. Cited by: item 2.
  • D. Chatterji (2014) Empirical investigation of causes and effects of code clones. The University of Alabama. Cited by: Table 1.
  • Q. Chen, S. Li, M. Yan, and X. Xia (2019) Code clone detection: a literature review. Journal of software 30 (4), pp. 962–980. Cited by: §1.
  • M. Chochlov (2017) State-of-the-art report on clone detection. Cited by: Table 1.
  • J. R. Cordy and C. K. Roy (2011) The nicad clone detector. In 2011 IEEE 19th International Conference on Program Comprehension, pp. 219–220. Cited by: §1.
  • E. V. de Paulo Sobrinho, A. De Lucia, and M. de Almeida Maia (2018) A systematic literature review on bad smells–5 w’s: which, when, what, who, where. IEEE Transactions on Software Engineering 47 (1), pp. 17–66. Cited by: item 1, §3.1.1, §3.3, Table 2.
  • M. Gusenbauer (2019) Google scholar to overshadow them all? comparing the sizes of 12 academic search engines and bibliographic databases. Scientometrics 118 (1), pp. 177–214. Cited by: §3.1.4.
  • M. Hammad, H. A. Basit, S. Jarzabek, and R. Koschke (2020) A systematic mapping study of clone visualization. Computer Science Review 37, pp. 100266. Cited by: §1, Table 1.
  • A. W. Harzing (2012) The publish or perish book: your guide to effective and responsible citation analysis. International Review of Research in Open & Distance Learning 13 (3), pp. 314–315. Cited by: §3.1.4.
  • A. Harzing (2010) The publish or perish book. Tarma Software Research Pty Limited Melbourne, Australia. Cited by: §3.1.4.
  • H. Honda, S. Tokui, K. Yokoi, E. Choi, N. Yoshida, and K. Inoue (2019) CCEvovis: a clone evolution visualization system for software maintenance. In 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC), pp. 122–125. Cited by: §1.
  • W. Hordijk, M. L. Ponisio, and R. Wieringa (2009) Review of code clone articles. University of Twente, The Netherlands, http://eprints. eemcs. utwente. nl/12257/01/TR-CTIT-08-33. pdf, accessed on May 18, pp. 23. Cited by: Table 2.
  • M. D. Humphries, K. Gurney, and T. J. Prescott (2006) The brainstem reticular formation is a small-world, not scale-free, network. Proceedings of the Royal Society B: Biological Sciences 273 (1585), pp. 503–511. Cited by: §5.4.
  • M. D. Humphries and K. Gurney (2008) Network ’small-world-ness’: a quantitative method for determining canonical network equivalence. PloS one 3 (4), pp. e0002051. Cited by: §5.4.
  • S. Jalali and C. Wohlin (2012) Systematic literature studies: database searches vs. backward snowballing. In Proceedings of the 2012 ACM-IEEE international symposium on empirical software engineering and measurement, pp. 29–38. Cited by: §3.1.1, §3.3.
  • Y. Kamei, H. Sato, A. Monden, S. Kawaguchi, H. Uwano, M. Nagura, K. Matsumoto, and N. Ubayashi (2011) An empirical study of fault prediction with code clone metrics. In 2011 Joint Conference of the 21st International Workshop on Software Measurement and the 6th International Conference on Software Process and Product Measurement, pp. 55–61. Cited by: §5.1.3.
  • S. Keele et al. (2007) Guidelines for performing systematic literature reviews in software engineering. Technical report Technical report, Ver. 2.3 EBSE Technical Report. EBSE. Cited by: §3.1.1, §3.1.3, §3.
  • I. Keivanloo, C. Forbes, and J. Rilling (2012) Similarity search plug-in: clone detection meets internet-scale code search. In 2012 4th International Workshop on Search-Driven Development: Users, Infrastructure, Tools, and Evaluation (SUITE), pp. 21–22. Cited by: §5.1.3.
  • R. Koschke (2008) Frontiers of software clone management. In 2008 Frontiers of Software Maintenance, pp. 119–128. Cited by: §5.1.3.
  • M. Latapy (2008) Main-memory triangle computations for very large (sparse (power-law)) graphs. Theoretical computer science 407 (1-3), pp. 458–473. Cited by: §5.4.
  • G. Li, Y. Wu, C. K. Roy, J. Sun, X. Peng, N. Zhan, B. Hu, and J. Ma (2020) SAGA: efficient and large-scale detection of near-miss clones with gpu acceleration. In 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 272–283. Cited by: §6.
  • H. Li, H. Kwon, J. Kwon, and H. Lee (2016) CLORIFI: software vulnerability discovery using code clone verification. Concurrency and Computation: Practice and Experience 28 (6), pp. 1900–1917. Cited by: §5.1.3.
  • H. Liu, Z. Yang, Y. Jiang, W. Zhao, and J. Sun (2019) Enabling clone detection for ethereum via smart contract birthmarks. In 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC), pp. 105–115. Cited by: §6.
  • A. Lozano, M. Wermelinger, and B. Nuseibeh (2007) Evaluating the harmfulness of cloning: a change based experiment. In Fourth International Workshop on Mining Software Repositories (MSR’07: ICSE Workshops 2007), pp. 18–18. Cited by: §5.1.3.
  • M. Mondal, C. K. Roy, and K. A. Schneider (2020) A survey on clone refactoring and tracking. Journal of Systems and Software 159, pp. 110429. Cited by: Table 1.
  • A. Monden, D. Nakae, T. Kamiya, S. Sato, and K. Matsumoto (2002) Software quality analysis by code clones in industrial legacy software. In Proceedings Eighth IEEE Symposium on Software Metrics, pp. 87–94. Cited by: §5.1.3.
  • H. Murakami, Y. Higo, and S. Kusumoto (2015) ClonePacker: a tool for clone set visualization. In 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER), pp. 474–478. Cited by: §1.
  • T. Nakagawa, Y. Higo, and S. Kusumoto (2021) NIL: large-scale detection of large-variance clones. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 830–841. Cited by: §6.
  • [44] A. Paiva and E. Figueiredo Do concern metrics support code clone detection?. Cited by: Table 1.
  • J. R. Pate, R. Tairas, and N. A. Kraft (2013) Clone evolution: a systematic review. Journal of software: Evolution and Process 25 (3), pp. 261–283. Cited by: §1, Table 1, §3.1.1, Table 2.
  • R. V. Patil, L. V. Patil, S. V. Shinde, and S. Joshi (2014) Software code cloning detection and future scope development-latest short review. In International Conference on Recent Advances and Innovations in Engineering (ICRAIE-2014), pp. 1–4. Cited by: Table 1, Table 2.
  • D. Rattan, R. Bhatia, and M. Singh (2013) Software clone detection: a systematic review. Information and Software Technology 55 (7), pp. 1165–1199. Cited by: §1, Table 1, §2, §3.1.3, Table 2, Table 4.
  • D. Rattan and J. Kaur (2016) Systematic mapping study of metrics based clone detection techniques. In Proceedings of the International Conference on Advances in Information Communication Technology & Computing, pp. 1–7. Cited by: Table 1.
  • H. Sajnani (2016) Large-scale code clone detection.. Ph.D. Thesis, University of California, Irvine.. Cited by: §6.
  • H. Sajnani, V. Saini, J. Svajlenko, C. K. Roy, and C. V. Lopes (2016) Sourcerercc: scaling code clone detection to big-code. In Proceedings of the 38th International Conference on Software Engineering, pp. 1157–1168. Cited by: §1.
  • T. Shippey, D. Bowes, B. Chrisianson, and T. Hall (2012) A mapping study of software code cloning. In 16th International Conference on Evaluation & Assessment in Software Engineering (EASE 2012), pp. 274–278. Cited by: Table 1, §2, Table 4.
  • D. Spencer (2009) Card sorting: designing usable categories. Rosenfeld Media. Cited by: §4.
  • X. Su and F. Zhang (2018) A survey for management-oriented code clone research. Chinese Journal of Computers 41 (3), pp. 24. Cited by: Table 1, §2, Table 4.
  • J. Svajlenko and C. K. Roy (2020) A survey on the evaluation of clone detection performance and benchmarking. arXiv preprint arXiv:2006.15682. Cited by: Table 1.
  • Q. K. Telesford, K. E. Joyce, S. Hayasaka, J. H. Burdette, and P. J. Laurienti (2011) The ubiquity of small-world networks. Brain connectivity 1 (5), pp. 367–375. Cited by: §5.4.
  • D. Tkaczyk, P. Szostek, M. Fedoryszak, P. J. Dendek, and Ł. Bolikowski (2015) CERMINE: automatic extraction of structured metadata from scientific literature. International Journal on Document Analysis and Recognition (IJDAR) 18 (4), pp. 317–335. Cited by: item 1.
  • K. Tonscheidt (2015) Leveraging code clone detection for the incremental migration of cloned product variants to a software product line: an explorative study. Bachelorarbeit, Otto-von-Guericke-Universität Magdeburg, pp. 4–16. Cited by: §5.1.3.
  • K. Wahlroos et al. (2019)

    Software plagiarism detection using n-grams

    .
    Cited by: Table 1.
  • K. Wang, L. Zhang, and S. Yan (2017) A study on code clone evolution analysis. In 2017 8th IEEE International Conference on Software Engineering and Service Science (ICSESS), pp. 340–345. Cited by: Table 1.
  • P. Wang, J. Svajlenko, Y. Wu, Y. Xu, and C. K. Roy (2018) CCAligner: a token based large-gap clone detector. In Proceedings of the 40th International Conference on Software Engineering, pp. 1066–1077. Cited by: §6.
  • Y. Wu, D. Zou, S. Dou, S. Yang, W. Yang, F. Cheng, H. Liang, and H. Jin (2020) SCDetector: software functional clone detection based on semantic tokens analysis. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, pp. 821–833. Cited by: §6.
  • M. Zhang, T. Hall, and N. Baddoo (2011) Code bad smells: a review of current knowledge. Journal of Software Maintenance and Evolution: research and practice 23 (3), pp. 179–202. Cited by: §3.1.1.
  • M. F. Zibran and C. K. Roy (2012) The road to software clone management: a survey. Dept. Comput. Sci., Univ. of Saskatchewan, Saskatoon, SK, Tech. Rep 3. Cited by: Table 1, §2, Table 4.
  • T. Zimmermann (2016) Card-sorting: from text to themes. In

    Perspectives on data science for software engineering

    ,
    pp. 137–141. Cited by: item 2, §4.