Improving Code Autocompletion with Transfer Learning

05/12/2021
by   Wen Zhou, et al.
0

Software language models have achieved promising results predicting code completion usages, and several industry studies have described successful IDE integrations. Recently, accuracy in autocompletion prediction improved 12.8 from training on a real-world dataset collected from programmers' IDE activity. But what if limited examples of IDE autocompletion in the target programming language are available for model training? In this paper, we investigate the efficacy of pretraining autocompletion models on non-IDE, non-autocompletion, and different-language example code sequences. We find that these unsupervised pretrainings improve model accuracy by over 50 datasets and over 10 of these pretrainings in an online setting through A/B testing on thousands of IDE autocompletion users, finding that pretraining is responsible for increases of up to 6.63

READ FULL TEXT
research
11/09/2020

Learning Autocompletion from Real-World Datasets

Code completion is a popular software development tool integrated into a...
research
02/05/2020

Aligning the Pretraining and Finetuning Objectives of Language Models

We demonstrate that explicitly aligning the pretraining objectives to th...
research
12/20/2022

A Survey on Pretrained Language Models for Neural Code Intelligence

As the complexity of modern software continues to escalate, software eng...
research
11/11/2021

Improving Large-scale Language Models and Resources for Filipino

In this paper, we improve on existing language resources for the low-res...
research
09/06/2018

Code-switched Language Models Using Dual RNNs and Same-Source Pretraining

This work focuses on building language models (LMs) for code-switched te...
research
05/22/2023

In-Context Learning of Large Language Models Explained as Kernel Regression

Large language models (LLMs) have initiated a paradigm shift in transfer...
research
02/19/2023

Why Is Public Pretraining Necessary for Private Model Training?

In the privacy-utility tradeoff of a model trained on benchmark language...

Please sign up or login with your details

Forgot password? Click here to reset