Measuring The Impact Of Programming Language Distribution

02/03/2023
by   Gabriel Orlanski, et al.
0

Current benchmarks for evaluating neural code models focus on only a small subset of programming languages, excluding many popular languages such as Go or Rust. To ameliorate this issue, we present the BabelCode framework for execution-based evaluation of any benchmark in any language. BabelCode enables new investigations into the qualitative performance of models' memory, runtime, and individual test case results. Additionally, we present a new code translation dataset called Translating Python Programming Puzzles (TP3) from the Python Programming Puzzles (Schuster et al. 2021) benchmark that involves translating expert-level python functions to any language. With both BabelCode and the TP3 benchmark, we investigate if balancing the distributions of 14 languages in a training dataset improves a large language model's performance on low-resource languages. Training a model on a balanced corpus results in, on average, 12.34 baseline. We find that this strategy achieves 66.48 low-resource languages at the cost of only a 12.94 languages. In our three translation tasks, this strategy yields, on average, 30.77 pass@k.

READ FULL TEXT

page 6

page 13

page 14

page 15

page 22

page 23

research
08/19/2023

Knowledge Transfer from High-Resource to Low-Resource Programming Languages for Code LLMs

Over the past few years, Large Language Models of Code (Code LLMs) have ...
research
08/17/2022

MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural Code Generation

Large language models have demonstrated the ability to generate both nat...
research
08/27/2018

It's Like Python But: Towards Supporting Transfer of Programming Language Knowledge

Expertise in programming traditionally assumes a binary novice-expert di...
research
11/24/2016

Learning Python Code Suggestion with a Sparse Pointer Network

To enhance developer productivity, all modern integrated development env...
research
07/20/2020

Jupyter Notebooks on GitHub: Characteristics and Code Clones

Jupyter notebooks have emerged as a standard tool for data science progr...
research
09/19/2022

Is Rust C++-fast? Benchmarking System Languages on Everyday Routines

Rust is a relatively new system programming language that has been exper...
research
12/19/2022

MultiCoder: Multi-Programming-Lingual Pre-Training for Low-Resource Code Completion

Code completion is a valuable topic in both academia and industry. Recen...

Please sign up or login with your details

Forgot password? Click here to reset