Unmasking the giant: A comprehensive evaluation of ChatGPT's proficiency in coding algorithms and data structures

07/10/2023
by   Sayed Erfan Arefin, et al.
0

The transformative influence of Large Language Models (LLMs) is profoundly reshaping the Artificial Intelligence (AI) technology domain. Notably, ChatGPT distinguishes itself within these models, demonstrating remarkable performance in multi-turn conversations and exhibiting code proficiency across an array of languages. In this paper, we carry out a comprehensive evaluation of ChatGPT's coding capabilities based on what is to date the largest catalog of coding challenges. Our focus is on the python programming language and problems centered on data structures and algorithms, two topics at the very foundations of Computer Science. We evaluate ChatGPT for its ability to generate correct solutions to the problems fed to it, its code quality, and nature of run-time errors thrown by its code. Where ChatGPT code successfully executes, but fails to solve the problem at hand, we look into patterns in the test cases passed in order to gain some insights into how wrong ChatGPT code is in these kinds of situations. To infer whether ChatGPT might have directly memorized some of the data that was used to train it, we methodically design an experiment to investigate this phenomena. Making comparisons with human performance whenever feasible, we investigate all the above questions from the context of both its underlying learning models (GPT-3.5 and GPT-4), on a vast array sub-topics within the main topics, and on problems having varying degrees of difficulty.

READ FULL TEXT

page 7

page 12

page 15

research
04/25/2023

AI-assisted coding: Experiments with GPT-4

Artificial intelligence (AI) tools based on large language models have a...
research
02/04/2021

Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models

On October 14th, 2020, researchers from OpenAI, the Stanford Institute f...
research
05/31/2023

Evaluating GPT's Programming Capability through CodeWars' Katas

In the burgeoning field of artificial intelligence (AI), understanding t...
research
08/25/2023

Does Asking Clarifying Questions Increases Confidence in Generated Code? On the Communication Skills of Large Language Models

Large language models (LLMs) have significantly improved the ability to ...
research
05/20/2021

Measuring Coding Challenge Competence With APPS

While programming is one of the most broadly applicable skills in modern...
research
07/28/2023

Private-Library-Oriented Code Generation with Large Language Models

Large language models (LLMs), such as Codex and GPT-4, have recently sho...
research
05/10/2023

Humans are Still Better than ChatGPT: Case of the IEEEXtreme Competition

Since the release of ChatGPT, numerous studies have highlighted the rema...

Please sign up or login with your details

Forgot password? Click here to reset