DeepSCC: Source Code Classification Based on Fine-Tuned RoBERTa

10/03/2021
by   Guang Yang, et al.
0

In software engineering-related tasks (such as programming language tag prediction based on code snippets from Stack Overflow), the programming language classification for code snippets is a common task. In this study, we propose a novel method DeepSCC, which uses a fine-tuned RoBERTa model to classify the programming language type of the source code. In our empirical study, we choose a corpus collected from Stack Overflow, which contains 224,445 pairs of code snippets and corresponding language types. After comparing nine state-of-the-art baselines from the fields of source code classification and neural text classification in terms of four performance measures (i.e., Accuracy, Precision, Recall, and F1), we show the competitiveness of our proposed method DeepSCC

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/21/2018

SCC: Automatic Classification of Code Snippets

Determining the programming language of a source code file has been cons...
research
09/21/2018

Predicting the Programming Language of Questions and Snippets of StackOverflow Using Natural Language Processing

Stack Overflow is the most popular Q&A website among software developers...
research
03/22/2021

psc2code: Denoising Code Extraction from Programming Screencasts

In this paper, we propose an approach named psc2code to denoise the proc...
research
06/16/2022

The Case for a Wholistic Serverless Programming Paradigm and Full Stack Automation for AI and Beyond – The Philosophy of Jaseci and Jac

In this work, the case is made for a wholistic top-down re-envisioning o...
research
08/05/2020

An Evolver program for weighted Steiner trees

We present an algorithm to find near-optimal weighted Steiner minimal tr...
research
12/06/2020

NaturalCC: A Toolkit to Naturalize the Source Code Corpus

We present NaturalCC, an efficient and extensible toolkit to bridge the ...
research
02/26/2021

On the Naming of Methods: A Survey of Professional Developers

This paper describes the results of a large (+1100 responses) survey of ...

Please sign up or login with your details

Forgot password? Click here to reset