Generative AI for Programming Education: Benchmarking ChatGPT, GPT-4, and Human Tutors

06/29/2023
by   Tung Phung, et al.
0

Generative AI and large language models hold great promise in enhancing computing education by powering next-generation educational technologies for introductory programming. Recent works have studied these models for different scenarios relevant to programming education; however, these works are limited for several reasons, as they typically consider already outdated models or only specific scenario(s). Consequently, there is a lack of a systematic study that benchmarks state-of-the-art models for a comprehensive set of programming education scenarios. In our work, we systematically evaluate two models, ChatGPT (based on GPT-3.5) and GPT-4, and compare their performance with human tutors for a variety of scenarios. We evaluate using five introductory Python programming problems and real-world buggy programs from an online platform, and assess performance using expert-based annotations. Our results show that GPT-4 drastically outperforms ChatGPT (based on GPT-3.5) and comes close to human tutors' performance for several scenarios. These results also highlight settings where GPT-4 still struggles, providing exciting future directions on developing techniques to improve the performance of these models.

READ FULL TEXT
research
07/30/2023

Evaluating ChatGPT and GPT-4 for Visual Programming

Generative AI and large language models have the potential to drasticall...
research
05/12/2023

Generative AI: Implications and Applications for Education

The launch of ChatGPT in November 2022 precipitated a panic among some e...
research
08/15/2023

Large Language Models in Introductory Programming Education: ChatGPT's Performance and Implications for Assessments

This paper investigates the performance of the Large Language Models (LL...
research
05/09/2022

A Transparency Index Framework for AI in Education

Numerous AI ethics checklists and frameworks have been proposed focusing...
research
11/07/2022

Automatic Creativity Measurement in Scratch Programs Across Modalities

Promoting creativity is considered an important goal of education, but c...
research
09/19/2023

Learning from Teaching Assistants to Program with Subgoals: Exploring the Potential for AI Teaching Assistants

With recent advances in generative AI, conversational models like ChatGP...
research
06/15/2023

Thrilled by Your Progress! Large Language Models (GPT-4) No Longer Struggle to Pass Assessments in Higher Education Programming Courses

This paper studies recent developments in large language models' (LLM) a...

Please sign up or login with your details

Forgot password? Click here to reset