Summarize and Generate to Back-translate: Unsupervised Translation of Programming Languages

05/23/2022
by   Wasi Uddin Ahmad, et al.
0

Back-translation is widely known for its effectiveness for neural machine translation when little to no parallel data is available. In this approach, a source-to-target model is coupled with a target-to-source model trained in parallel. The target-to-source model generates noisy sources, while the source-to-target model is trained to reconstruct the targets and vice versa. Recent developments of multilingual pre-trained sequence-to-sequence models for programming languages have been very effective for a broad spectrum of downstream software engineering tasks. Hence, it is compelling to train them to build programming language translation systems via back-translation. However, these models cannot be further trained via back-translation since they learn to output sequences in the same language as the inputs during pre-training. As an alternative, we propose performing back-translation via code summarization and generation. In code summarization, a model learns to generate natural language (NL) summaries given code snippets. In code generation, the model learns to do the opposite. Therefore, target-to-source generation in back-translation can be viewed as target-to-NL-to-source generation. We show that our proposed approach performs competitively with state-of-the-art methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/10/2021

Unified Pre-training for Program Understanding and Generation

Code summarization and generation empower conversion between programming...
research
06/27/2023

Constructing Multilingual Code Search Dataset Using Neural Machine Translation

Code search is a task to find programming codes that semantically match ...
research
10/11/2021

Using Document Similarity Methods to create Parallel Datasets for Code Translation

Translating source code from one programming language to another is a cr...
research
06/15/2021

Code to Comment Translation: A Comparative Study on Model Effectiveness Errors

Automated source code summarization is a popular software engineering re...
research
02/25/2019

Lost in Machine Translation: A Method to Reduce Meaning Loss

A desideratum of high-quality translation systems is that they preserve ...
research
03/23/2021

RPT: Effective and Efficient Retrieval of Program Translations from Big Code

Program translation is a growing demand in software engineering. Manual ...
research
02/15/2018

Teaching Machines to Code: Neural Markup Generation with Visual Attention

We present a deep recurrent neural network model with soft visual attent...

Please sign up or login with your details

Forgot password? Click here to reset