Unsupervised Translation of Programming Languages

06/05/2020
by   Marie-Anne Lachaux, et al.
0

A transcompiler, also known as source-to-source translator, is a system that converts source code from a high-level programming language (such as C++ or Python) to another. Transcompilers are primarily used for interoperability, and to port codebases written in an obsolete or deprecated language (e.g. COBOL, Python 2) to a modern one. They typically rely on handcrafted rewrite rules, applied to the source code abstract syntax tree. Unfortunately, the resulting translations often lack readability, fail to respect the target language conventions, and require manual modifications in order to work properly. The overall translation process is timeconsuming and requires expertise in both the source and target languages, making code-translation projects expensive. Although neural models significantly outperform their rule-based counterparts in the context of natural language translation, their applications to transcompilation have been limited due to the scarcity of parallel data in this domain. In this paper, we propose to leverage recent approaches in unsupervised machine translation to train a fully unsupervised neural transcompiler. We train our model on source code from open source GitHub projects, and show that it can translate functions between C++, Java, and Python with high accuracy. Our method relies exclusively on monolingual source code, requires no expertise in the source or target languages, and can easily be generalized to other programming languages. We also build and release a test set composed of 852 parallel functions, along with unit tests to check the correctness of translations. We show that our model outperforms rule-based commercial baselines by a significant margin.

READ FULL TEXT
research
10/13/2021

Leveraging Automated Unit Tests for Unsupervised Code Translation

With little to no parallel data available for programming languages, uns...
research
03/23/2021

RPT: Effective and Efficient Retrieval of Program Translations from Big Code

Program translation is a growing demand in software engineering. Manual ...
research
02/07/2023

J-Parallelio – automatic parallelization framework for Java virtual machine code

Manual translation of the algorithms from sequential version to its para...
research
04/01/2019

STYLE-ANALYZER: fixing code style inconsistencies with interpretable unsupervised algorithms

Source code reviews are manual, time-consuming, and expensive. Human inv...
research
02/08/2023

Syntax and Domain Aware Model for Unsupervised Program Translation

There is growing interest in software migration as the development of so...
research
09/02/2023

Towards Code Watermarking with Dual-Channel Transformations

The expansion of the open source community and the rise of large languag...
research
06/01/2017

Function Assistant: A Tool for NL Querying of APIs

In this paper, we describe Function Assistant, a lightweight Python-base...

Please sign up or login with your details

Forgot password? Click here to reset