Evaluating the Code Quality of AI-Assisted Code Generation Tools: An Empirical Study on GitHub Copilot, Amazon CodeWhisperer, and ChatGPT

04/21/2023
by   Burak Yetiştiren, et al.
0

Context: AI-assisted code generation tools have become increasingly prevalent in software engineering, offering the ability to generate code from natural language prompts or partial code inputs. Notable examples of these tools include GitHub Copilot, Amazon CodeWhisperer, and OpenAI's ChatGPT. Objective: This study aims to compare the performance of these prominent code generation tools in terms of code quality metrics, such as Code Validity, Code Correctness, Code Security, Code Reliability, and Code Maintainability, to identify their strengths and shortcomings. Method: We assess the code generation capabilities of GitHub Copilot, Amazon CodeWhisperer, and ChatGPT using the benchmark HumanEval Dataset. The generated code is then evaluated based on the proposed code quality metrics. Results: Our analysis reveals that the latest versions of ChatGPT, GitHub Copilot, and Amazon CodeWhisperer generate correct code 65.2 of the time, respectively. In comparison, the newer versions of GitHub CoPilot and Amazon CodeWhisperer showed improvement rates of 18 7 smells, was found to be 8.9 minutes for ChatGPT, 9.1 minutes for GitHub Copilot, and 5.6 minutes for Amazon CodeWhisperer. Conclusions: This study highlights the strengths and weaknesses of some of the most popular code generation tools, providing valuable insights for practitioners. By comparing these generators, our results may assist practitioners in selecting the optimal tool for specific tasks, enhancing their decision-making process.

READ FULL TEXT

page 23

page 33

page 34

page 35

research
08/09/2023

No Need to Lift a Finger Anymore? Assessing the Quality of Code Generation by ChatGPT

Large language models (LLMs) have demonstrated impressive capabilities a...
research
04/25/2023

AI-assisted coding: Experiments with GPT-4

Artificial intelligence (AI) tools based on large language models have a...
research
05/16/2023

A Preliminary Analysis on the Code Generation Capabilities of GPT-3.5 and Bard AI Models for Java Functions

This paper evaluates the capability of two state-of-the-art artificial i...
research
08/20/2021

An Empirical Cybersecurity Evaluation of GitHub Copilot's Code Contributions

There is burgeoning interest in designing AI-based systems to assist hum...
research
02/01/2023

On the Robustness of Code Generation Techniques: An Empirical Study on GitHub Copilot

Software engineering research has always being concerned with the improv...
research
08/05/2023

LLM is Like a Box of Chocolates: the Non-determinism of ChatGPT in Code Generation

There has been a recent explosion of research on Large Language Models (...
research
07/28/2020

SoK: All You Ever Wanted to Know About x86/x64 Binary Disassembly But Were Afraid to Ask

Disassembly of binary code is hard, but necessary for improving the secu...

Please sign up or login with your details

Forgot password? Click here to reset