Exploring Data Augmentation for Code Generation Tasks

02/05/2023
by   Pinzhen Chen, et al.
0

Advances in natural language processing, such as transfer learning from pre-trained language models, have impacted how models are trained for programming language tasks too. Previous research primarily explored code pre-training and expanded it through multi-modality and multi-tasking, yet the data for downstream tasks remain modest in size. Focusing on data utilization for downstream tasks, we propose and adapt augmentation methods that yield consistent improvements in code translation and summarization by up to 6.9 7.5 and show benefits in output code style and numeric consistency. We also discuss test data imperfections.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/09/2020

Unsupervised Paraphrase Generation using Pre-trained Language Models

Large scale Pre-trained Language Models have proven to be very powerful ...
research
11/28/2019

Data Augmentation for Deep Transfer Learning

Current approaches to deep learning are beginning to rely heavily on tra...
research
04/28/2023

Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4

In this work, we carry out a data archaeology to infer books that are kn...
research
06/11/2021

Bridging Subword Gaps in Pretrain-Finetune Paradigm for Natural Language Generation

A well-known limitation in pretrain-finetune paradigm lies in its inflex...
research
01/24/2022

Cobol2Vec: Learning Representations of Cobol code

There has been a steadily growing interest in development of novel metho...
research
08/11/2023

Diverse Data Augmentation with Diffusions for Effective Test-time Prompt Tuning

Benefiting from prompt tuning, recent years have witnessed the promising...
research
06/30/2023

EyeBAG: Accurate Control of Eye Blink and Gaze Based on Data Augmentation Leveraging Style Mixing

Recent developments in generative models have enabled the generation of ...

Please sign up or login with your details

Forgot password? Click here to reset