Large Language Models (GPT) Struggle to Answer Multiple-Choice Questions about Code

03/09/2023
by   Jaromír Šavelka, et al.
0

We analyzed effectiveness of three generative pre-trained transformer (GPT) models in answering multiple-choice question (MCQ) assessments, often involving short snippets of code, from introductory and intermediate programming courses at the postsecondary level. This emerging technology stirs countless discussions of its potential uses (e.g., exercise generation, code explanation) as well as misuses in programming education (e.g., cheating). However, the capabilities of GPT models and their limitations to reason about and/or analyze code in educational settings have been under-explored. We evaluated several OpenAI's GPT models on formative and summative MCQ assessments from three Python courses (530 questions). We found that MCQs containing code snippets are not answered as successfully as those that only contain natural language. While questions requiring to fill-in a blank in the code or completing a natural language statement about the snippet are handled rather successfully, MCQs that require analysis and/or reasoning about the code (e.g., what is true/false about the snippet, or what is its output) appear to be the most challenging. These findings can be leveraged by educators to adapt their instructional practices and assessments in programming courses, so that GPT becomes a valuable assistant for a learner as opposed to a source of confusion and/or potential hindrance in the learning process.

READ FULL TEXT
research
03/16/2023

Can Generative Pre-trained Transformers (GPT) Pass Assessments in Higher Education Programming Courses?

We evaluated the capability of generative pre-trained transformers (GPT)...
research
02/07/2023

ChatGPT and Software Testing Education: Promises Perils

Over the past decade, predictive language modeling for code has proven t...
research
03/07/2023

Many bioinformatics programming tasks can be automated with ChatGPT

Computer programming is a fundamental tool for life scientists, allowing...
research
06/15/2023

Thrilled by Your Progress! Large Language Models (GPT-4) No Longer Struggle to Pass Assessments in Higher Education Programming Courses

This paper studies recent developments in large language models' (LLM) a...
research
07/18/2023

How is ChatGPT's behavior changing over time?

GPT-3.5 and GPT-4 are the two most widely used large language model (LLM...
research
08/11/2021

Natural Language-Guided Programming

In today's software world with its cornucopia of reusable software libra...
research
05/19/2011

A Multiple-Choice Test Recognition System based on the Gamera Framework

This article describes JECT-OMR, a system that analyzes digital images r...

Please sign up or login with your details

Forgot password? Click here to reset