Measuring Coding Challenge Competence With APPS

05/20/2021
by   Dan Hendrycks, et al.
0

While programming is one of the most broadly applicable skills in modern society, modern machine learning models still cannot code solutions to basic problems. Despite its importance, there has been surprisingly little work on evaluating code generation, and it can be difficult to accurately assess code generation performance rigorously. To meet this challenge, we introduce APPS, a benchmark for code generation. Unlike prior work in more restricted settings, our benchmark measures the ability of models to take an arbitrary natural language specification and generate satisfactory Python code. Similar to how companies assess candidate software developers, we then evaluate models by checking their generated code on test cases. Our benchmark includes 10,000 problems, which range from having simple one-line solutions to being substantial algorithmic challenges. We fine-tune large language models on both GitHub and our training set, and we find that the prevalence of syntax errors is decreasing exponentially as models improve. Recent models such as GPT-Neo can pass approximately 20 find that machine learning models are now beginning to learn how to code. As the social significance of automatic code generation increases over the coming years, our benchmark can provide an important measure for tracking advancements.

READ FULL TEXT

Authors

page 8

02/21/2021

Automatic Code Generation using Pre-Trained Language Models

Recent advancements in natural language processing <cit.> <cit.> have le...
06/03/2022

Automatic Generation of Programming Exercises and Code Explanations with Large Language Models

OpenAI Codex is a recent large language model from the GPT-3 family for ...
01/19/2022

GAP-Gen: Guided Automatic Python Code Generation

Automatic code generation from natural language descriptions can be high...
08/16/2021

Program Synthesis with Large Language Models

This paper explores the limits of the current generation of large langua...
01/27/2021

In-IDE Code Generation from Natural Language: Promise and Challenges

A great part of software development involves conceptualizing or communi...
02/08/2022

Competition-Level Code Generation with AlphaCode

Programming is a powerful and ubiquitous problem-solving tool. Developin...
07/07/2021

Evaluating Large Language Models Trained on Code

We introduce Codex, a GPT language model fine-tuned on publicly availabl...

Code Repositories

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.