Apples, Oranges Fruits – Understanding Similarity of Software Projects Through The Lens of Dissimilar Artifacts

03/02/2021
by   A Eashaan Rao, et al.
0

The growing availability of open source projects has facilitated developers to reuse existing software artifacts and leverage them to develop new software. However, it is hard to understand the notion of similarity as it varies from developer to developer. Some developers might search for repositories with similar source code, while some might be in search of repositories with similar requirements or issues. Existing approaches tend to find similar projects by comparing similar artifacts such as source-code to source-code, API usage to API usage, documentation to documentation, and so on. Even though there is a dissimilarity between two similar artifacts, there could be a similarity between two dissimilar artifacts. Hence, in this paper, we aim to answer the question - Can we find similarity of software repositories through dissimilar artifacts?. To this end, we conduct an experiment to find similarities between three repositories, two similar and one different project comparing similar and dissimilar artifacts (documentation, commits, and source-code). We observed similarities between dissimilar artifacts such as Commits, Source Code, and Readme Files in the context of both similar and different repositories.

READ FULL TEXT
research
02/12/2021

The Software Heritage Filesystem (SwhFS): Integrating Source Code Archival with Development

We introduce the Software Heritage filesystem (SwhFS), a user-space file...
research
08/20/2018

Leveraging Historical Associations between Requirements and Source Code to Identify Impacted Classes

As new requirements are introduced and implemented in a software system,...
research
08/29/2018

Use of Source Code Similarity Metrics in Software Defect Prediction

In recent years, defect prediction has received a great deal of attentio...
research
01/27/2023

A sustainable infrastructure concept for improved accessibility, reusability, and archival of research software

Research software is an integral part of most research today and it is w...
research
06/06/2020

Replacements and Replaceables: Making the Case for Code Variants

There are often multiple ways to implement the same requirement in sourc...
research
03/16/2021

From Innovations to Prospects: What Is Hidden Behind Cryptocurrencies?

The great influence of Bitcoin has promoted the rapid development of blo...
research
04/11/2018

Experimental similarity assessment for a collection of fragmented artifacts

In the Visual Heritage domain, search engines are expected to support ar...

Please sign up or login with your details

Forgot password? Click here to reset