CODE-MVP: Learning to Represent Source Code from Multiple Views with Contrastive Pre-Training

05/04/2022
by   Xin Wang, et al.
0

Recent years have witnessed increasing interest in code representation learning, which aims to represent the semantics of source code into distributed vectors. Currently, various works have been proposed to represent the complex semantics of source code from different views, including plain text, Abstract Syntax Tree (AST), and several kinds of code graphs (e.g., Control/Data Flow Graph). However, most of them only consider a single view of source code independently, ignoring the correspondences among different views. In this paper, we propose to integrate different views with the natural-language description of source code into a unified framework with Multi-View contrastive Pre-training, and name our model as CODE-MVP. Specifically, we first extract multiple code views using compiler tools, and learn the complementary information among them under a contrastive learning framework. Inspired by the type checking in compilation, we also design a fine-grained type inference objective in the pre-training. Experiments on three downstream tasks over five datasets demonstrate the superiority of CODE-MVP when compared with several state-of-the-art baselines. For example, we achieve 2.4/2.3/1.1 gain in terms of MRR/MAP/Accuracy metrics on natural language code retrieval, code similarity, and code defect detection tasks, respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/10/2021

SynCoBERT: Syntax-Guided Multi-Modal Contrastive Pre-Training for Code Representation

Code representation learning, which aims to encode the semantics of sour...
research
10/11/2022

COMBO: Pre-Training Representations of Binary Code Using Contrastive Learning

Compiled software is delivered as executable binary code. Developers wri...
research
02/25/2022

Multi-View Graph Representation for Programming Language Processing: An Investigation into Algorithm Detection

Program representation, which aims at converting program source code int...
research
09/06/2020

Self-Supervised Learning for Code Retrieval and Summarization through Semantic-Preserving Program Transformations

Code retrieval and summarization are useful tasks for developers, but it...
research
10/18/2022

Soft-Labeled Contrastive Pre-training for Function-level Code Representation

Code contrastive pre-training has recently achieved significant progress...
research
11/18/2019

Commit2Vec: Learning Distributed Representations of Code Changes

Deep learning methods, which have found successful applications in field...
research
05/18/2023

ProgSG: Cross-Modality Representation Learning for Programs in Electronic Design Automation

Recent years have witnessed the growing popularity of domain-specific ac...

Please sign up or login with your details

Forgot password? Click here to reset