Knowledge Transfer for Pseudo-code Generation from Low Resource Programming Language

03/16/2023
by   Ankita Sontakke, et al.
0

Generation of pseudo-code descriptions of legacy source code for software maintenance is a manually intensive task. Recent encoder-decoder language models have shown promise for automating pseudo-code generation for high resource programming languages such as C++, but are heavily reliant on the availability of a large code-pseudocode corpus. Soliciting such pseudocode annotations for codes written in legacy programming languages (PL) is a time consuming and costly affair requiring a thorough understanding of the source PL. In this paper, we focus on transferring the knowledge acquired by the code-to-pseudocode neural model trained on a high resource PL (C++) using parallel code-pseudocode data. We aim to transfer this knowledge to a legacy PL (C) with no PL-pseudocode parallel data for training. To achieve this, we utilize an Iterative Back Translation (IBT) approach with a novel test-cases based filtration strategy, to adapt the trained C++-to-pseudocode model to C-to-pseudocode model. We observe an improvement of 23.27 of the generated C codes through back translation, over the successive IBT iteration, illustrating the efficacy of our approach.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/19/2023

Knowledge Transfer from High-Resource to Low-Resource Programming Languages for Code LLMs

Over the past few years, Large Language Models of Code (Code LLMs) have ...
research
08/26/2021

AVATAR: A Parallel Corpus for Java-Python Program Translation

Program translation refers to migrating source code from one programming...
research
06/14/2019

Generation of Pseudo Code from the Python Source Code using Rule-Based Machine Translation

Pseudo code is one of the valuable artifacts to comprehending the comple...
research
09/06/2022

Automatic Code Documentation Generation Using GPT-3

Source code documentation is an important artifact for efficient softwar...
research
02/07/2023

J-Parallelio – automatic parallelization framework for Java virtual machine code

Manual translation of the algorithms from sequential version to its para...
research
07/24/2023

The potential of LLMs for coding with low-resource and domain-specific programming languages

This paper presents a study on the feasibility of using large language m...
research
02/12/2021

DeepPseudo: Deep Pseudo-code Generation via Transformer and Code Feature Extraction

Pseudo-code written by natural language is helpful for novice developers...

Please sign up or login with your details

Forgot password? Click here to reset