Multilingual Code Co-Evolution Using Large Language Models

by   Jiyang Zhang, et al.

Many software projects implement APIs and algorithms in multiple programming languages. Maintaining such projects is tiresome, as developers have to ensure that any change (e.g., a bug fix or a new feature) is being propagated, timely and without errors, to implementations in other programming languages. In the world of ever-changing software, using rule-based translation tools (i.e., transpilers) or machine learning models for translating code from one language to another provides limited value. Translating each time the entire codebase from one language to another is not the way developers work. In this paper, we target a novel task: translating code changes from one programming language to another using large language models (LLMs). We design and implement the first LLM, dubbed Codeditor, to tackle this task. Codeditor explicitly models code changes as edit sequences and learns to correlate changes across programming languages. To evaluate Codeditor, we collect a corpus of 6,613 aligned code changes from 8 pairs of open-source software projects implementing similar functionalities in two programming languages (Java and C#). Results show that Codeditor outperforms the state-of-the-art approaches by a large margin on all commonly used automatic metrics. Our work also reveals that Codeditor is complementary to the existing generation-based models, and their combination ensures even greater performance.


page 1

page 2

page 3

page 4


The Comprehensive Blub Archive Network: Towards Design Principals for Open Source Programming Language Repositories

Many popular open source programming languages (Perl, Ruby or Python for...

Proactive Empirical Assessment of New Language Feature Adoption via Automated Refactoring: The Case of Java 8 Default Methods

Programming languages and platforms improve over time, sometimes resulti...

ESP32: QEMU Emulation within a Docker Container

The ESP32 is a popular microcontroller from Espressif that can be used i...

A Systematic Evaluation of Large Language Models of Code

Large language models (LMs) of code have recently shown tremendous promi...

On the Bug-proneness of Structures Inspired by Functional Programming in JavaScript Projects

Language constructs inspired by functional programming have made their w...

Investigating and Recommending Co-Changed Entities for JavaScript Programs

JavaScript (JS) is one of the most popular programming languages due to ...

Expansion and evolution of the R programming language

Change in language use is driven by cultural forces; it is unclear wheth...

Please sign up or login with your details

Forgot password? Click here to reset