Is this Snippet Written by ChatGPT? An Empirical Study with a CodeBERT-Based Classifier

07/18/2023
by   Phuong T. Nguyen, et al.
0

Since its launch in November 2022, ChatGPT has gained popularity among users, especially programmers who use it as a tool to solve development problems. However, while offering a practical solution to programming problems, ChatGPT should be mainly used as a supporting tool (e.g., in software education) rather than as a replacement for the human being. Thus, detecting automatically generated source code by ChatGPT is necessary, and tools for identifying AI-generated content may need to be adapted to work effectively with source code. This paper presents an empirical study to investigate the feasibility of automated identification of AI-generated code snippets, and the factors that influence this ability. To this end, we propose a novel approach called GPTSniffer, which builds on top of CodeBERT to detect source code written by AI. The results show that GPTSniffer can accurately classify whether code is human-written or AI-generated, and outperforms two baselines, GPTZero and OpenAI Text Classifier. Also, the study shows how similar training data or a classification context with paired snippets helps to boost classification performances.

READ FULL TEXT
research
12/14/2018

Supporting software documentation with source code summarization

Source code summarization is a process of generating summaries that desc...
research
05/22/2023

The "code” of Ethics:A Holistic Audit of AI Code Generators

AI-powered programming language generation (PLG) models have gained incr...
research
03/16/2021

From Innovations to Prospects: What Is Hidden Behind Cryptocurrencies?

The great influence of Bitcoin has promoted the rapid development of blo...
research
03/15/2023

Practices and Challenges of Using GitHub Copilot: An Empirical Study

With the advances in machine learning, there is a growing interest in AI...
research
12/11/2022

Authorship Identification of Source Code Segments Written by Multiple Authors Using Stacking Ensemble Method

Source code segment authorship identification is the task of identifying...
research
03/15/2018

Using StackOverflow content to assist in code review

An important goal for programmers is to minimize cost of identifying and...
research
08/18/2022

An Empirical Evaluation of Competitive Programming AI: A Case Study of AlphaCode

AlphaCode is a code generation system for assisting software developers ...

Please sign up or login with your details

Forgot password? Click here to reset