Neuro-symbolic Zero-Shot Code Cloning with Cross-Language Intermediate Representation

04/26/2023
by   Krishnam Hasija, et al.
0

In this paper, we define a neuro-symbolic approach to address the task of finding semantically similar clones for the codes of the legacy programming language COBOL, without training data. We define a meta-model that is instantiated to have an Intermediate Representation (IR) in the form of Abstract Syntax Trees (ASTs) common across codes in C and COBOL. We linearize the IRs using Structure Based Traversal (SBT) to create sequential inputs. We further fine-tune UnixCoder, the best-performing model for zero-shot cross-programming language code search, for the Code Cloning task with the SBT IRs of C code-pairs, available in the CodeNet dataset. This allows us to learn latent representations for the IRs of the C codes, which are transferable to the IRs of the COBOL codes. With this fine-tuned UnixCoder, we get a performance improvement of 12.85 MAP@2 over the pre-trained UniXCoder model, in a zero-shot setting, on the COBOL test split synthesized from the CodeNet dataset. This demonstrates the efficacy of our meta-model based approach to facilitate cross-programming language transfer.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/21/2021

Soft Layer Selection with Meta-Learning for Zero-Shot Cross-Lingual Transfer

Multilingual pre-trained contextual embedding models (Devlin et al., 201...
research
06/27/2023

Constructing Multilingual Code Search Dataset Using Neural Machine Translation

Code search is a task to find programming codes that semantically match ...
research
04/18/2022

Zero-Shot Program Representation Learning

Learning program representations has been the core prerequisite of code ...
research
01/01/2022

Cross-Domain Deep Code Search with Few-Shot Meta Learning

Recently, pre-trained programming language models such as CodeBERT have ...
research
01/26/2021

El Volumen Louder Por Favor: Code-switching in Task-oriented Semantic Parsing

Being able to parse code-switched (CS) utterances, such as Spanish+Engli...
research
06/16/2022

Zero-Shot AutoML with Pretrained Models

Given a new dataset D and a low compute budget, how should we choose a p...
research
01/30/2023

Type Theory as a Language Workbench

Language Workbenches offer language designers an expressive environment ...

Please sign up or login with your details

Forgot password? Click here to reset