Execution-Based Evaluation for Open-Domain Code Generation

12/20/2022
by   Zhiruo Wang, et al.
0

To extend the scope of coding queries to more realistic settings, we propose ODEX, the first open-domain execution-based natural language (NL) to code generation dataset. ODEX has 945 NL-Code pairs spanning 79 diverse libraries, along with 1,707 human-written test cases for execution. Our NL-Code pairs are harvested from StackOverflow forums to encourage natural and practical coding queries, which are then carefully rephrased to ensure intent clarity and prevent potential data memorization. Moreover, ODEX supports four natural languages as intents, in English, Spanish, Japanese, and Russian. ODEX unveils intriguing behavioral differences between top-performing Code LMs: Codex performs better on open-domain queries, yet CodeGen captures a better balance between open- and closed-domain. ODEX corroborates the merits of execution-based evaluation over metrics without execution but also unveils their complementary effects. Powerful models such as CodeGen-6B only achieve an 11.96 pass rate at top-1 prediction, suggesting plenty of headroom for improvement. We release ODEX to facilitate research into open-domain problems for the code generation community.

READ FULL TEXT
research
01/22/2023

CodeScore: Evaluating Code Generation by Learning Code Execution

A proper code evaluation metric (CEM) profoundly impacts the evolution o...
research
03/16/2022

MCoNaLa: A Benchmark for Code Generation from Multiple Natural Languages

While there has been a recent burgeoning of applications at the intersec...
research
11/17/2022

Execution-based Evaluation for Data Science Code Generation Models

Code generation models can benefit data scientists' productivity by auto...
research
08/29/2018

Auto-generated Spies Increase Test Maintainability

We have inspected the test code for the scala.collection.Iterator trait ...
research
02/16/2022

Code Generation for Unknown Libraries via Reading API Documentations

Open-domain code generation is a challenging problem because the set of ...
research
05/12/2020

Semantic Scaffolds for Pseudocode-to-Code Generation

We propose a method for program generation based on semantic scaffolds, ...
research
08/16/2021

Systematic Generation of Conformance Tests for JavaScript

JavaScript implementations are tested for conformance to the ECMAScript ...

Please sign up or login with your details

Forgot password? Click here to reset