Evaluating and Optimizing the Effectiveness of Neural Machine Translation in Supporting Code Retrieval Models: A Study on the CAT Benchmark

08/09/2023
by   Hung Phan, et al.
0

Neural Machine Translation (NMT) is widely applied in software engineering tasks. The effectiveness of NMT for code retrieval relies on the ability to learn from the sequence of tokens in the source language to the sequence of tokens in the target language. While NMT performs well in pseudocode-to-code translation, it might have challenges in learning to translate from natural language query to source code in newly curated real-world code documentation/ implementation datasets. In this work, we analyze the performance of NMT in natural language-to-code translation in the newly curated CAT benchmark that includes the optimized versions of three Java datasets TLCodeSum, CodeSearchNet, Funcom, and a Python dataset PCSD. Our evaluation shows that NMT has low accuracy, measured by CrystalBLEU and Meteor metrics in this task. To alleviate the duty of NMT in learning complex representation of source code, we propose ASTTrans Representation, a tailored representation of an Abstract Syntax Tree (AST) using a subset of non-terminal nodes. We show that the classical approach NMT performs significantly better in learning ASTTrans Representation over code tokens with up to 36 Moreover, we leverage ASTTrans Representation to conduct combined code search processes from the state-of-the-art code search processes using GraphCodeBERT and UniXcoder. Our NMT models of learning ASTTrans Representation can boost the Mean Reciprocal Rank of these state-of-the-art code search processes by up to 3.08

READ FULL TEXT
research
08/04/2018

code2seq: Generating Sequences from Structured Representations of Code

The ability to generate natural language sequences from source code snip...
research
05/22/2023

Neural Machine Translation for Code Generation

Neural machine translation (NMT) methods developed for natural language ...
research
09/30/2018

Tree2Tree Neural Translation Model for Learning Source Code Changes

The way developers edit day-to-day code tend to be repetitive and often ...
research
03/29/2022

Can NMT Understand Me? Towards Perturbation-based Evaluation of NMT Models for Code Generation

Neural Machine Translation (NMT) has reached a level of maturity to be r...
research
01/03/2023

Boosting Neural Networks to Decompile Optimized Binaries

Decompilation aims to transform a low-level program language (LPL) (eg.,...
research
06/30/2022

Code Translation with Compiler Representations

In this paper, we leverage low-level compiler intermediate representatio...
research
02/27/2021

A Differential Testing Approach for Evaluating Abstract Syntax Tree Mapping Algorithms

Abstract syntax tree (AST) mapping algorithms are widely used to analyze...

Please sign up or login with your details

Forgot password? Click here to reset