ICodeNet – A Hierarchical Neural Network Approach for Source Code Author Identification

01/30/2021
by   Pranali Bora, et al.
0

With the open-source revolution, source codes are now more easily accessible than ever. This has, however, made it easier for malicious users and institutions to copy the code without giving regards to the license, or credit to the original author. Therefore, source code author identification is a critical task with paramount importance. In this paper, we propose ICodeNet - a hierarchical neural network that can be used for source code file-level tasks. The ICodeNet processes source code in image format and is employed for the task of per file author identification. The ICodeNet consists of an ImageNet trained VGG encoder followed by a shallow neural network. The shallow network is based either on CNN or LSTM. Different variations of models are evaluated on a source code author classification dataset. We have also compared our image-based hierarchical neural network model with simple image-based CNN architecture and text-based CNN and LSTM models to highlight its novelty and efficiency.

READ FULL TEXT

page 9

page 10

research
12/11/2022

Authorship Identification of Source Code Segments Written by Multiple Authors Using Stacking Ensemble Method

Source code segment authorship identification is the task of identifying...
research
05/26/2018

Splitting source code identifiers using Bidirectional LSTM Recurrent Neural Network

Programmers make rich use of natural language in the source code they wr...
research
01/29/2021

The significance of user-defined identifiers in Java source code authorship identification

When writing source code, programmers have varying levels of freedom whe...
research
07/15/2019

DeepRace: Finding Data Race Bugs via Deep Learning

With the proliferation of multi-core hardware, parallel programs have be...
research
03/18/2020

A Dataset and an Approach for Identity Resolution of 38 Million Author IDs extracted from 2B Git Commits

The data collected from open source projects provide means to model larg...
research
03/22/2021

psc2code: Denoising Code Extraction from Programming Screencasts

In this paper, we propose an approach named psc2code to denoise the proc...
research
08/26/2022

I still know it's you! On Challenges in Anonymizing Source Code

The source code of a program not only defines its semantics but also con...

Please sign up or login with your details

Forgot password? Click here to reset