HELoC: Hierarchical Contrastive Learning of Source Code Representation

03/27/2022
by   Xiao Wang, et al.
0

Abstract syntax trees (ASTs) play a crucial role in source code representation. However, due to the large number of nodes in an AST and the typically deep AST hierarchy, it is challenging to learn the hierarchical structure of an AST effectively. In this paper, we propose HELoC, a hierarchical contrastive learning model for source code representation. To effectively learn the AST hierarchy, we use contrastive learning to allow the network to predict the AST node level and learn the hierarchical relationships between nodes in a self-supervised manner, which makes the representation vectors of nodes with greater differences in AST levels farther apart in the embedding space. By using such vectors, the structural similarities between code snippets can be measured more precisely. In the learning process, a novel GNN (called Residual Self-attention Graph Neural Network, RSGNN) is designed, which enables HELoC to focus on embedding the local structure of an AST while capturing its overall structure. HELoC is self-supervised and can be applied to many source code related downstream tasks such as code classification, code clone detection, and code clustering after pre-training. Our extensive experiments demonstrate that HELoC outperforms the state-of-the-art source code representation models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/10/2021

SynCoBERT: Syntax-Guided Multi-Modal Contrastive Pre-Training for Code Representation

Code representation learning, which aims to encode the semantics of sour...
research
03/14/2023

Implant Global and Local Hierarchy Information to Sequence based Code Representation Models

Source code representation with deep learning techniques is an important...
research
04/17/2022

Addressing Leakage in Self-Supervised Contextualized Code Retrieval

We address contextualized code retrieval, the search for code snippets h...
research
03/16/2023

All4One: Symbiotic Neighbour Contrastive Learning via Self-Attention and Redundancy Reduction

Nearest neighbour based methods have proved to be one of the most succes...
research
06/05/2023

CONCORD: Clone-aware Contrastive Learning for Source Code

Deep Learning (DL) models to analyze source code have shown immense prom...
research
03/08/2022

Self-supervised Social Relation Representation for Human Group Detection

Human group detection, which splits crowd of people into groups, is an i...
research
06/09/2023

Agent market orders representation through a contrastive learning approach

Due to the access to the labeled orders on the CAC40 data from Euronext,...

Please sign up or login with your details

Forgot password? Click here to reset