CoderEval: A Benchmark of Pragmatic Code Generation with Generative Pre-trained Models

02/01/2023
by   Hao Yu, et al.
0

Code generation models based on the pre-training and fine-tuning paradigm have been increasingly attempted by both academia and industry, resulting in well-known industrial models such as Codex, CodeGen, and PanGu-Coder. To validate the performance of these models, multiple existing benchmarks (e.g., AiXBench and HumanEval) are proposed, including only cases of generating a standalone function, i.e., a function that invokes or accesses only built-in functions and standard libraries. However, standalone functions constitute only about 30% of functions from real open-source projects. To assess a model's performance for pragmatic code generation (i.e., code generation for real settings of open source or proprietary code), in this paper, we propose a benchmark named CoderEval of pragmatic code generation with generative pre-trained models. Compared with the widely-used HumanEval benchmark from OpenAI, CoderEval can be used to assess the performance of models against pragmatic code generation beyond just generating standalone functions. Through the evaluation of three public available models (CodeGen, PanGu-Coder, and Codex) on CoderEval, we analyze and discuss the current progress and future directions of pragmatic code generation with a generative pre-trained model.

READ FULL TEXT

page 4

page 11

page 14

research
06/15/2022

NatGen: Generative pre-training by "Naturalizing" source code

Pre-trained Generative Language models (e.g. PLBART, CodeT5, SPT-Code) f...
research
08/31/2023

BioCoder: A Benchmark for Bioinformatics Code Generation with Contextual Pragmatic Knowledge

Pre-trained language models like ChatGPT have significantly improved cod...
research
02/19/2020

CodeBERT: A Pre-Trained Model for Programming and Natural Languages

We present CodeBERT, a bimodal pre-trained model for programming languag...
research
08/30/2022

Deep Generative Modeling on Limited Data with Regularization by Nontransferable Pre-trained Models

Deep generative models (DGMs) are data-eager. Essentially, it is because...
research
05/22/2023

DUMB: A Benchmark for Smart Evaluation of Dutch Models

We introduce the Dutch Model Benchmark: DUMB. The benchmark includes a d...
research
08/19/2023

On-the-fly Improving Performance of Deep Code Models via Input Denoising

Deep learning has been widely adopted to tackle various code-based tasks...
research
09/05/2023

nanoT5: A PyTorch Framework for Pre-training and Fine-tuning T5-style Models with Limited Resources

State-of-the-art language models like T5 have revolutionized the NLP lan...

Please sign up or login with your details

Forgot password? Click here to reset