Syntax and Domain Aware Model for Unsupervised Program Translation

02/08/2023
by   Fang Liu, et al.
0

There is growing interest in software migration as the development of software and society. Manually migrating projects between languages is error-prone and expensive. In recent years, researchers have begun to explore automatic program translation using supervised deep learning techniques by learning from large-scale parallel code corpus. However, parallel resources are scarce in the programming language domain, and it is costly to collect bilingual data manually. To address this issue, several unsupervised programming translation systems are proposed. However, these systems still rely on huge monolingual source code to train, which is very expensive. Besides, these models cannot perform well for translating the languages that are not seen during the pre-training procedure. In this paper, we propose SDA-Trans, a syntax and domain-aware model for program translation, which leverages the syntax structure and domain knowledge to enhance the cross-lingual transfer ability. SDA-Trans adopts unsupervised training on a smaller-scale corpus, including Python and Java monolingual programs. The experimental results on function translation tasks between Python, Java, and C++ show that SDA-Trans outperforms many large-scale pre-trained models, especially for unseen language translation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/26/2021

AVATAR: A Parallel Corpus for Java-Python Program Translation

Program translation refers to migrating source code from one programming...
research
06/05/2020

Unsupervised Translation of Programming Languages

A transcompiler, also known as source-to-source translator, is a system ...
research
02/21/2023

On ML-Based Program Translation: Perils and Promises

With the advent of new and advanced programming languages, it becomes im...
research
12/13/2022

ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for Programming Languages

Software engineers working with the same programming language (PL) may s...
research
10/11/2021

Using Document Similarity Methods to create Parallel Datasets for Code Translation

Translating source code from one programming language to another is a cr...
research
02/07/2023

J-Parallelio – automatic parallelization framework for Java virtual machine code

Manual translation of the algorithms from sequential version to its para...
research
06/21/2023

A Chain of AI-based Solutions for Resolving FQNs and Fixing Syntax Errors in Partial Code

API documentation, technical blogs and programming Q A sites contain n...

Please sign up or login with your details

Forgot password? Click here to reset