What does Transformer learn about source code?

07/18/2022
by   Kechi Zhang, et al.
0

In the field of source code processing, the transformer-based representation models have shown great powerfulness and have achieved state-of-the-art (SOTA) performance in many tasks. Although the transformer models process the sequential source code, pieces of evidence show that they may capture the structural information (, in the syntax tree, data flow, control flow, ) as well. We propose the aggregated attention score, a method to investigate the structural information learned by the transformer. We also put forward the aggregated attention graph, a new way to extract program graphs from the pre-trained models automatically. We measure our methods from multiple perspectives. Furthermore, based on our empirical findings, we use the automatically extracted graphs to replace those ingenious manual designed graphs in the Variable Misuse task. Experimental results show that the semantic graphs we extracted automatically are greatly meaningful and effective, which provide a new perspective for us to understand and use the information contained in the model.

READ FULL TEXT

page 2

page 5

page 7

page 8

page 9

page 10

page 11

page 12

research
02/14/2022

What Do They Capture? – A Structural Analysis of Pre-Trained Language Models for Source Code

Recently, many pre-trained language models for source code have been pro...
research
09/07/2022

AutoPruner: Transformer-Based Call Graph Pruning

Constructing a static call graph requires trade-offs between soundness a...
research
12/29/2020

SIT3: Code Summarization with Structure-Induced Transformer

Code summarization (CS) is becoming a promising area in recent natural l...
research
05/11/2022

CV4Code: Sourcecode Understanding via Visual Code Representations

We present CV4Code, a compact and effective computer vision method for s...
research
05/23/2022

AdaptivePaste: Code Adaptation through Learning Semantics-aware Variable Usage Representations

In software development, it is common for programmers to copy-paste code...
research
06/01/2021

On using distributed representations of source code for the detection of C security vulnerabilities

This paper presents an evaluation of the code representation model Code2...
research
02/17/2022

Transformer for Graphs: An Overview from Architecture Perspective

Recently, Transformer model, which has achieved great success in many ar...

Please sign up or login with your details

Forgot password? Click here to reset