CoditT5: Pretraining for Source Code and Natural Language Editing

08/10/2022
by   Jiyang Zhang, et al.
0

Pretrained language models have been shown to be effective in many software-related generation tasks; however, they are not well-suited for editing tasks as they are not designed to reason about edits. To address this, we propose a novel pretraining objective which explicitly models edits and use it to build CoditT5, a large language model for software-related editing tasks that is pretrained on large amounts of source code and natural language comments. We fine-tune it on various downstream editing tasks, including comment updating, bug fixing, and automated code review. By outperforming standard generation-based models, we demonstrate the generalizability of our approach and its suitability for editing tasks. We also show how a standard generation model and our edit-based model can complement one another through simple reranking strategies, with which we achieve state-of-the-art performance for the three downstream editing tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/08/2022

BioBART: Pretraining and Evaluation of A Biomedical Generative Language Model

Pretrained language models have served as important backbones for natura...
research
08/01/2023

CodeBPE: Investigating Subtokenization Options for Large Language Model Pretraining on Source Code

Recent works have widely adopted large language model pretraining for so...
research
06/08/2020

Copy that! Editing Sequences by Copying Spans

Neural sequence-to-sequence models are finding increasing use in editing...
research
03/14/2023

Casual Source Code Editing

There has been substantial research undertaken on the role of computatio...
research
06/10/2023

Automated Code Editing with Search-Generate-Modify

Code editing is essential in evolving software development. Many automat...
research
05/13/2023

CodeT5+: Open Code Large Language Models for Code Understanding and Generation

Large language models (LLMs) pretrained on vast source code have achieve...
research
06/13/2022

Memory-Based Model Editing at Scale

Even the largest neural networks make errors, and once-correct predictio...

Please sign up or login with your details

Forgot password? Click here to reset