An Exploratory Study on Code Attention in BERT

04/05/2022
by   Rishab Sharma, et al.
0

Many recent models in software engineering introduced deep neural models based on the Transformer architecture or use transformer-based Pre-trained Language Models (PLM) trained on code. Although these models achieve the state of the arts results in many downstream tasks such as code summarization and bug detection, they are based on Transformer and PLM, which are mainly studied in the Natural Language Processing (NLP) field. The current studies rely on the reasoning and practices from NLP for these models in code, despite the differences between natural languages and programming languages. There is also limited literature on explaining how code is modeled. Here, we investigate the attention behavior of PLM on code and compare it with natural language. We pre-trained BERT, a Transformer based PLM, on code and explored what kind of information it learns, both semantic and syntactic. We run several experiments to analyze the attention values of code constructs on each other and what BERT learns in each layer. Our analyses show that BERT pays more attention to syntactic entities, specifically identifiers and separators, in contrast to the most attended token [CLS] in NLP. This observation motivated us to leverage identifiers to represent the code sequence instead of the [CLS] token when used for code clone detection. Our results show that employing embeddings from identifiers increases the performance of BERT by 605 When identifiers' embeddings are used in CodeBERT, a code-based PLM, the performance is improved by 21-24 findings can benefit the research community by using code-specific representations instead of applying the common embeddings used in NLP, and open new directions for developing smaller models with similar performance.

READ FULL TEXT
research
04/19/2022

On The Cross-Modal Transfer from Natural Language to Code through Adapter Modules

Pre-trained neural Language Models (PTLM), such as CodeBERT, are recentl...
research
08/25/2021

What do pre-trained code models know about code?

Pre-trained models of code built on the transformer architecture have pe...
research
07/09/2022

A Closer Look into Transformer-Based Code Intelligence Through Code Transformation: Challenges and Opportunities

Transformer-based models have demonstrated state-of-the-art performance ...
research
08/07/2023

Evaluating and Explaining Large Language Models for Code Using Syntactic Structures

Large Language Models (LLMs) for code are a family of high-parameter, tr...
research
05/28/2019

Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain)

Neural network models for NLP are typically implemented without the expl...
research
04/09/2021

Transformers: "The End of History" for NLP?

Recent advances in neural architectures, such as the Transformer, couple...
research
05/03/2022

Neural language models for network configuration: Opportunities and reality check

Boosted by deep learning, natural language processing (NLP) techniques h...

Please sign up or login with your details

Forgot password? Click here to reset