Soft-Labeled Contrastive Pre-training for Function-level Code Representation

10/18/2022
by   Xiaonan Li, et al.
0

Code contrastive pre-training has recently achieved significant progress on code-related tasks. In this paper, we present SCodeR, a Soft-labeled contrastive pre-training framework with two positive sample construction methods to learn functional-level Code Representation. Considering the relevance between codes in a large-scale code corpus, the soft-labeled contrastive pre-training can obtain fine-grained soft-labels through an iterative adversarial manner and use them to learn better code representation. The positive sample construction is another key for contrastive pre-training. Previous works use transformation-based methods like variable renaming to generate semantically equal positive codes. However, they usually result in the generated code with a highly similar surface form, and thus mislead the model to focus on superficial code structure instead of code semantics. To encourage SCodeR to capture semantic information from the code, we utilize code comments and abstract syntax sub-trees of the code to build positive samples. We conduct experiments on four code-related tasks over seven datasets. Extensive experimental results show that SCodeR achieves new state-of-the-art performance on all of them, which illustrates the effectiveness of the proposed pre-training method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/26/2022

CodeRetriever: Unimodal and Bimodal Contrastive Learning

In this paper, we propose the CodeRetriever model, which combines the un...
research
09/14/2022

SPACE-2: Tree-Structured Semi-Supervised Contrastive Pre-training for Task-Oriented Dialog Understanding

Pre-training methods with contrastive learning objectives have shown rem...
research
05/04/2022

CODE-MVP: Learning to Represent Source Code from Multiple Views with Contrastive Pre-Training

Recent years have witnessed increasing interest in code representation l...
research
01/18/2023

Towards a Holistic Understanding of Mathematical Questions with Contrastive Pre-training

Understanding mathematical questions effectively is a crucial task, whic...
research
06/05/2023

Unsupervised Dense Retrieval with Relevance-Aware Contrastive Pre-Training

Dense retrievers have achieved impressive performance, but their demand ...
research
01/24/2022

Text and Code Embeddings by Contrastive Pre-Training

Text embeddings are useful features in many applications such as semanti...
research
02/17/2022

Augment with Care: Contrastive Learning for the Boolean Satisfiability Problem

Supervised learning can improve the design of state-of-the-art solvers f...

Please sign up or login with your details

Forgot password? Click here to reset