StudentEval: A Benchmark of Student-Written Prompts for Large Language Models of Code

06/07/2023
by   Hannah McLean Babe, et al.
0

Code LLMs are being rapidly deployed and there is evidence that they can make professional programmers more productive. Current benchmarks for code generation measure whether models generate correct programs given an expert prompt. In this paper, we present a new benchmark containing multiple prompts per problem, written by a specific population of non-expert prompters: beginning programmers. StudentEval contains 1,749 prompts for 48 problems, written by 80 students who have only completed one semester of Python programming. Our students wrote these prompts while working interactively with a Code LLM, and we observed very mixed success rates. We use StudentEval to evaluate 5 Code LLMs and find that StudentEval is a better discriminator of model performance than existing benchmarks. We analyze the prompts and find significant variation in students' prompting techniques. We also find that nondeterministic LLM sampling could mislead students into thinking that their prompts are more (or less) effective than they actually are, which has implications for how to teach with Code LLMs.

READ FULL TEXT
research
08/17/2022

MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural Code Generation

Large language models have demonstrated the ability to generate both nat...
research
02/10/2021

SQLRepair: Identifying and Repairing Mistakes in Student-Authored SQL Queries

Computer science educators seek to understand the types of mistakes that...
research
08/20/2022

Security Implications of Large Language Model Code Assistants: A User Study

Advances in Deep Learning have led to the emergence of Large Language Mo...
research
10/13/2019

Google Summer of Code: Student Motivations and Contributions

Several open source software (OSS) projects expect to foster newcomers' ...
research
05/19/2018

On Portrait of a Specialist in Open Data

The article is written to identify the requirements for Open Data Specia...
research
09/16/2023

ChatGPT-4 with Code Interpreter can be used to solve introductory college-level vector calculus and electromagnetism problems

We evaluated ChatGPT 3.5, 4, and 4 with Code Interpreter on a set of col...
research
07/31/2023

Promptly: Using Prompt Problems to Teach Learners How to Effectively Utilize AI Code Generators

With their remarkable ability to generate code, large language models (L...

Please sign up or login with your details

Forgot password? Click here to reset