A Study of Potential Code Borrowing and License Violations in Java Projects on GitHub

02/12/2020
by   Yaroslav Golubev, et al.
0

With an ever-increasing amount of open source software, the popularity of services like GitHub that facilitate code reuse, and common misconceptions about the licensing of open source software, the problem of license violations in the code is getting more and more prominent. In this study, we compile an extensive corpus of popular Java projects from GitHub, search it for code clones and perform an original analysis of possible code borrowing and license violations on the level of code fragments. We chose Java as a language because of its popularity in industry, where the plagiarism problem is especially relevant because of possible legal action. We analyze and discuss distribution of 94 different discovered and manually evaluated licenses in files and projects, differences in the licensing of files, distribution of potential code borrowing between licenses, various types of possible license violations, most violated licenses, etc. Studying possible license violations in specific blocks of code, we have discovered that 29.6 code borrowing and 9.4

READ FULL TEXT

page 1

page 2

page 4

page 5

page 6

page 7

page 10

page 11

research
07/09/2021

On the Nature of Code Cloning in Open-Source Java Projects

Code cloning plays a very important role in open-source software enginee...
research
11/26/2020

On the diversity and frequency of code related to mathematical formulas in real-world Java projects

In this paper, the term formula code refers to fragments of source code ...
research
08/02/2019

The Technical Debt Dataset

Technical Debt analysis is increasing in popularity as nowadays research...
research
06/21/2022

An Empirical Study On Correlation between Readme Content and Project Popularity

Readme in GitHub repositories serves as a preliminary source of informat...
research
09/03/2023

Who Made This Copy? An Empirical Analysis of Code Clone Authorship

Code clones are code snippets that are identical or similar to other sni...
research
10/11/2019

Design Smell Analysis for Developing and Established Open Source Java Software

Software design smells are design attributes which violate the fundament...
research
01/06/2023

Codepod: A Namespace-Aware, Hierarchical Jupyter for Interactive Development at Scale

Jupyter is a browser-based interactive development environment that has ...

Please sign up or login with your details

Forgot password? Click here to reset