xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval

The ability to solve problems is a hallmark of intelligence and has been an enduring goal in AI. AI systems that can create programs as solutions to problems or assist developers in writing programs can increase productivity and make programming more accessible. Recently, pre-trained large language models have shown impressive abilities in generating new codes from natural language descriptions, repairing buggy codes, translating codes between languages, and retrieving relevant code segments. However, the evaluation of these models has often been performed in a scattered way on only one or two specific tasks, in a few languages, at a partial granularity (e.g., function) level and in many cases without proper training data. Even more concerning is that in most cases the evaluation of generated codes has been done in terms of mere lexical overlap rather than actual execution whereas semantic similarity (or equivalence) of two code segments depends only on their “execution similarity”, i.e., being able to get the same output for a given input.

READ FULL TEXT
research
10/26/2022

Multi-lingual Evaluation of Code Generation Models

We present MBXP, an execution-based code completion benchmark in 10+ pro...
research
08/18/2022

An Empirical Evaluation of Competitive Programming AI: A Case Study of AlphaCode

AlphaCode is a code generation system for assisting software developers ...
research
02/08/2022

Competition-Level Code Generation with AlphaCode

Programming is a powerful and ubiquitous problem-solving tool. Developin...
research
01/26/2022

Synchromesh: Reliable code generation from pre-trained language models

Large pre-trained language models have been used to generate code,provid...
research
05/25/2021

Project CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks

Advancements in deep learning and machine learning algorithms have enabl...
research
12/12/2022

Who Evaluates the Evaluators? On Automatic Metrics for Assessing AI-based Offensive Code Generators

AI-based code generators are an emerging solution for automatically writ...
research
02/15/2023

Studying the effect of AI Code Generators on Supporting Novice Learners in Introductory Programming

AI code generators like OpenAI Codex have the potential to assist novice...

Please sign up or login with your details

Forgot password? Click here to reset