MultiCoder: Multi-Programming-Lingual Pre-Training for Low-Resource Code Completion

12/19/2022
by   Zi Gong, et al.
0

Code completion is a valuable topic in both academia and industry. Recently, large-scale mono-programming-lingual (MonoPL) pre-training models have been proposed to boost the performance of code completion. However, the code completion on low-resource programming languages (PL) is difficult for the data-driven paradigm, while there are plenty of developers using low-resource PLs. On the other hand, there are few studies exploring the effects of multi-programming-lingual (MultiPL) pre-training for the code completion, especially the impact on low-resource programming languages. To this end, we propose the MultiCoder to enhance the low-resource code completion via MultiPL pre-training and MultiPL Mixture-of-Experts (MoE) layers. We further propose a novel PL-level MoE routing strategy (PL-MoE) for improving the code completion on all PLs. Experimental results on CodeXGLUE and MultiCC demonstrate that 1) the proposed MultiCoder significantly outperforms the MonoPL baselines on low-resource programming languages, and 2) the PL-MoE module further boosts the performance on six programming languages. In addition, we analyze the effects of the proposed method in details and explore the effectiveness of our method in a variety of scenarios.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/19/2023

Knowledge Transfer from High-Resource to Low-Resource Programming Languages for Code LLMs

Over the past few years, Large Language Models of Code (Code LLMs) have ...
research
07/24/2023

The potential of LLMs for coding with low-resource and domain-specific programming languages

This paper presents a study on the feasibility of using large language m...
research
04/08/2022

Hierarchical Softmax for End-to-End Low-resource Multilingual Speech Recognition

Low resource speech recognition has been long-suffering from insufficien...
research
02/03/2023

Measuring The Impact Of Programming Language Distribution

Current benchmarks for evaluating neural code models focus on only a sma...
research
05/31/2023

MetaXLR – Mixed Language Meta Representation Transformation for Low-resource Cross-lingual Learning based on Multi-Armed Bandit

Transfer learning for extremely low resource languages is a challenging ...
research
10/26/2022

Multi-lingual Evaluation of Code Generation Models

We present MBXP, an execution-based code completion benchmark in 10+ pro...
research
08/26/2023

ZC3: Zero-Shot Cross-Language Code Clone Detection

Developers introduce code clones to improve programming productivity. Ma...

Please sign up or login with your details

Forgot password? Click here to reset