M2TS: Multi-Scale Multi-Modal Approach Based on Transformer for Source Code Summarization

03/18/2022
by   Yuexiu Gao, et al.
0

Source code summarization aims to generate natural language descriptions of code snippets. Many existing studies learn the syntactic and semantic knowledge of code snippets from their token sequences and Abstract Syntax Trees (ASTs). They use the learned code representations as input to code summarization models, which can accordingly generate summaries describing source code. Traditional models traverse ASTs as sequences or split ASTs into paths as input. However, the former loses the structural properties of ASTs, and the latter destroys the overall structure of ASTs. Therefore, comprehensively capturing the structural features of ASTs in learning code representations for source code summarization remains a challenging problem to be solved. In this paper, we propose M2TS, a Multi-scale Multi-modal approach based on Transformer for source code Summarization. M2TS uses a multi-scale AST feature extraction method, which can extract the structures of ASTs more completely and accurately at multiple local and global levels. To complement missing semantic information in ASTs, we also obtain code token features, and further combine them with the extracted AST features using a cross modality fusion method that not only fuses the syntactic and contextual semantic information of source code, but also highlights the key features of each modality. We conduct experiments on two Java and one Python datasets, and the experimental results demonstrate that M2TS outperforms current state-of-the-art methods. We release our code at https://github.com/TranSMS/M2TS.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/19/2022

MMF3: Neural Code Summarization Based on Multi-Modal Fine-Grained Feature Fusion

Background: Code summarization automatically generates the corresponding...
research
08/06/2017

CodeSum: Translate Program Language to Natural Language

During software maintenance, programmers spend a lot of time on code com...
research
03/12/2021

A Multi-Modal Transformer-based Code Summarization Approach for Smart Contracts

Code comment has been an important part of computer programs, greatly fa...
research
12/03/2022

iEnhancer-ELM: Improve Enhancer Identification by Extracting Multi-scale Contextual Information based on Enhancer Language Models

Motivation: Enhancers are important cis-regulatory elements that regulat...
research
03/04/2023

Demystifying What Code Summarization Models Learned

Study patterns that models have learned has long been a focus of pattern...
research
09/30/2019

Multi-Modal Attention Network Learning for Semantic Source Code Retrieval

Code retrieval techniques and tools have been playing a key role in faci...
research
02/09/2021

Demystifying Code Summarization Models

The last decade has witnessed a rapid advance in machine learning models...

Please sign up or login with your details

Forgot password? Click here to reset