CodeTrans: Towards Cracking the Language of Silicon's Code Through Self-Supervised Deep Learning and High Performance Computing

04/06/2021
by   Ahmed Elnaggar, et al.
0

Currently, a growing number of mature natural language processing applications make people's life more convenient. Such applications are built by source code - the language in software engineering. However, the applications for understanding source code language to ease the software engineering process are under-researched. Simultaneously, the transformer model, especially its combination with transfer learning, has been proven to be a powerful technique for natural language processing tasks. These breakthroughs point out a promising direction for process source code and crack software engineering tasks. This paper describes CodeTrans - an encoder-decoder transformer model for tasks in the software engineering domain, that explores the effectiveness of encoder-decoder transformer models for six software engineering tasks, including thirteen sub-tasks. Moreover, we have investigated the effect of different training strategies, including single-task learning, transfer learning, multi-task learning, and multi-task learning with fine-tuning. CodeTrans outperforms the state-of-the-art models on all the tasks. To expedite future works in the software engineering domain, we have published our pre-trained models of CodeTrans. https://github.com/agemagician/CodeTrans

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/10/2021

On the validity of pre-trained transformers for natural language processing in the software engineering domain

Transformers are the current state-of-the-art of natural language proces...
research
03/20/2018

Natural Language or Not (NLoN) - A Package for Software Engineering Text Analysis Pipeline

The use of natural language processing (NLP) is gaining popularity in so...
research
06/29/2021

Making the most of small Software Engineering datasets with modern machine learning

This paper provides a starting point for Software Engineering (SE) resea...
research
02/13/2020

Deep Learning for Source Code Modeling and Generation: Models, Applications and Challenges

Deep Learning (DL) techniques for Natural Language Processing have been ...
research
08/31/2018

Total Recall, Language Processing, and Software Engineering

A broad class of software engineering problems can be generalized as the...
research
08/06/2021

Distilling Transformers for Neural Cross-Domain Search

Pre-trained transformers have recently clinched top spots in the gamut o...
research
08/19/2023

Evaluating Transfer Learning for Simplifying GitHub READMEs

Software documentation captures detailed knowledge about a software prod...

Please sign up or login with your details

Forgot password? Click here to reset