Authorship Identification of Source Code Segments Written by Multiple Authors Using Stacking Ensemble Method

12/11/2022
by   Parvez Mahbub, et al.
0

Source code segment authorship identification is the task of identifying the author of a source code segment through supervised learning. It has vast importance in plagiarism detection, digital forensics, and several other law enforcement issues. However, when a source code segment is written by multiple authors, typical author identification methods no longer work. Here, an author identification technique, capable of predicting the authorship of source code segments, even in the case of multiple authors, has been proposed which uses a stacking ensemble classifier. This proposed technique is built upon several deep neural networks, random forests and support vector machine classifiers. It has been shown that for identifying the author group, a single classification technique is no longer sufficient and using a deep neural network-based stacking ensemble method can enhance the accuracy significantly. The performance of the proposed technique has been compared with some existing methods which only deal with the source code segments written precisely by a single author. Despite the harder task of authorship identification for source code segments written by multiple authors, our proposed technique has achieved promising results evidenced by the identification accuracy, compared to the related works which only deal with code segments written by a single author.

READ FULL TEXT

page 1

page 3

research
01/30/2021

ICodeNet – A Hierarchical Neural Network Approach for Source Code Author Identification

With the open-source revolution, source codes are now more easily access...
research
07/03/2017

Including Dialects and Language Varieties in Author Profiling

This paper presents a computational approach to author profiling taking ...
research
01/29/2021

The significance of user-defined identifiers in Java source code authorship identification

When writing source code, programmers have varying levels of freedom whe...
research
07/18/2023

Is this Snippet Written by ChatGPT? An Empirical Study with a CodeBERT-Based Classifier

Since its launch in November 2022, ChatGPT has gained popularity among u...
research
08/26/2022

I still know it's you! On Challenges in Anonymizing Source Code

The source code of a program not only defines its semantics but also con...
research
09/05/2016

Volume Raycasting mit OpenCL

This German paper was written entirely at the University of Duisburg-Ess...
research
05/31/2022

Using Source Code Metrics for Predicting Metamorphic Relations at Method Level

Metamorphic testing (TM) examines the relations between inputs and outpu...

Please sign up or login with your details

Forgot password? Click here to reset