MCoNaLa: A Benchmark for Code Generation from Multiple Natural Languages

03/16/2022
by   Zhiruo Wang, et al.
0

While there has been a recent burgeoning of applications at the intersection of natural and programming languages, such as code generation and code summarization, these applications are usually English-centric. This creates a barrier for program developers who are not proficient in English. To mitigate this gap in technology development across languages, we propose a multilingual dataset, MCoNaLa, to benchmark code generation from natural language commands extending beyond English. Modeled off of the methodology from the English Code/Natural Language Challenge (CoNaLa) dataset, we annotated a total of 896 NL-code pairs in three languages: Spanish, Japanese, and Russian. We present a quantitative evaluation of performance on the MCoNaLa dataset by testing with state-of-the-art code generation systems. While the difficulties vary across these three languages, all systems lag significantly behind their English counterparts, revealing the challenges in adapting code generation to new languages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/17/2022

MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural Code Generation

Large language models have demonstrated the ability to generate both nat...
research
08/27/2021

Lyra: A Benchmark for Turducken-Style Code Generation

Code generation is crucial to reduce manual software development efforts...
research
08/27/2020

DAVE: Deriving Automatically Verilog from English

While specifications for digital systems are provided in natural languag...
research
12/20/2022

Execution-Based Evaluation for Open-Domain Code Generation

To extend the scope of coding queries to more realistic settings, we pro...
research
10/26/2022

Multi-lingual Evaluation of Code Generation Models

We present MBXP, an execution-based code completion benchmark in 10+ pro...
research
04/03/2022

MSCCD: Grammar Pluggable Clone Detection Based on ANTLR Parser Generation

For various reasons, programming languages continue to multiply and evol...
research
12/07/2020

What Meaning-Form Correlation Has to Compose With

Compositionality is a widely discussed property of natural languages, al...

Please sign up or login with your details

Forgot password? Click here to reset