Multi-Modal Attention Network Learning for Semantic Source Code Retrieval

09/30/2019
by   Yao Wan, et al.
0

Code retrieval techniques and tools have been playing a key role in facilitating software developers to retrieve existing code fragments from available open-source repositories given a user query. Despite the existing efforts in improving the effectiveness of code retrieval, there are still two main issues hindering them from being used to accurately retrieve satisfiable code fragments from large-scale repositories when answering complicated queries. First, the existing approaches only consider shallow features of source code such as method names and code tokens, but ignoring structured features such as abstract syntax trees (ASTs) and control-flow graphs (CFGs) of source code, which contains rich and well-defined semantics of source code. Second, although the deep learning-based approach performs well on the representation of source code, it lacks the explainability, making it hard to interpret the retrieval results and almost impossible to understand which features of source code contribute more to the final results. To tackle the two aforementioned issues, this paper proposes MMAN, a novel Multi-Modal Attention Network for semantic source code retrieval. A comprehensive multi-modal representation is developed for representing unstructured and structured features of source code, with one LSTM for the sequential tokens of code, a Tree-LSTM for the AST of code and a GGNN (Gated Graph Neural Network) for the CFG of code. Furthermore, a multi-modal attention fusion layer is applied to assign weights to different parts of each modality of source code and then integrate them into a single hybrid representation. Comprehensive experiments and analysis on a large-scale real-world dataset show that our proposed model can accurately retrieve code snippets and outperforms the state-of-the-art methods.

READ FULL TEXT
research
03/24/2021

deGraphCS: Embedding Variable-based Flow Graph for Neural Code Search

With the rapid increase in the amount of public code repositories, devel...
research
08/10/2021

SynCoBERT: Syntax-Guided Multi-Modal Contrastive Pre-Training for Code Representation

Code representation learning, which aims to encode the semantics of sour...
research
03/18/2022

M2TS: Multi-Scale Multi-Modal Approach Based on Transformer for Source Code Summarization

Source code summarization aims to generate natural language descriptions...
research
12/02/2020

CRaDLe: Deep Code Retrieval Based on Semantic Dependency Learning

Code retrieval is a common practice for programmers to reuse existing co...
research
07/26/2019

Scalable Source Code Similarity Detection in Large Code Repositories

Source code similarity are increasingly used in application development ...
research
05/26/2018

Splitting source code identifiers using Bidirectional LSTM Recurrent Neural Network

Programmers make rich use of natural language in the source code they wr...
research
06/22/2023

Feature Mixing for Writer Retrieval and Identification on Papyri Fragments

This paper proposes a deep-learning-based approach to writer retrieval a...

Please sign up or login with your details

Forgot password? Click here to reset