MetaTPTrans: A Meta Learning Approach for Multilingual Code Representation Learning

06/13/2022
by   Weiguo Pian, et al.
0

Representation learning of source code is essential for applying machine learning to software engineering tasks. Learning code representation across different programming languages has been shown to be more effective than learning from single-language datasets, since more training data from multi-language datasets improves the model's ability to extract language-agnostic information from source code. However, existing multi-language models overlook the language-specific information which is crucial for downstream tasks that is training on multi-language datasets, while only focusing on learning shared parameters among the different languages. To address this problem, we propose MetaTPTrans, a meta learning approach for multilingual code representation learning. MetaTPTrans generates different parameters for the feature extractor according to the specific programming language of the input source code snippet, enabling the model to learn both language-agnostics and language-specific information. Experimental results show that MetaTPTrans improves the F1 score of state-of-the-art approaches significantly by up to 2.40 percentage points for code summarization, a language-agnostic task; and the prediction accuracy of Top-1 (Top-5) by up to 7.32 (13.15) percentage points for code completion, a language-specific task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/21/2021

Language-Agnostic Representation Learning of Source Code from Structure and Context

Source code (Context) and its parsed abstract syntax tree (AST; Structur...
research
05/19/2023

CCT-Code: Cross-Consistency Training for Multilingual Clone Detection and Code Search

We consider the clone detection and information retrieval problems for s...
research
03/29/2019

A Convolutional Neural Network for Language-Agnostic Source Code Summarization

Descriptive comments play a crucial role in the software engineering pro...
research
12/12/2022

A Pre-Trained BERT Model for Android Applications

The automation of an increasingly large number of software engineering t...
research
07/09/2020

Contrastive Code Representation Learning

Machine-aided programming tools such as automated type predictors and au...
research
07/05/2023

An Exploratory Literature Study on Sharing and Energy Use of Language Models for Source Code

Large language models trained on source code can support a variety of so...
research
01/20/2023

Which Features are Learned by CodeBert: An Empirical Study of the BERT-based Source Code Representation Learning

The Bidirectional Encoder Representations from Transformers (BERT) were ...

Please sign up or login with your details

Forgot password? Click here to reset