CodeT5+: Open Code Large Language Models for Code Understanding and Generation

05/13/2023
by   Yue Wang, et al.
0

Large language models (LLMs) pretrained on vast source code have achieved prominent progress in code intelligence. However, existing code LLMs have two main limitations in terms of architecture and pretraining tasks. First, they often adopt a specific architecture (encoder-only or decoder-only) or rely on a unified encoder-decoder network for different downstream tasks. The former paradigm is limited by inflexibility in applications while in the latter, the model is treated as a single system for all tasks, leading to suboptimal performance on a subset of tasks. Secondly, they often employ a limited set of pretraining objectives which might not be relevant to some downstream tasks and hence result in substantial performance degrade. To address these limitations, we propose “CodeT5+”, a family of encoder-decoder LLMs for code in which component modules can be flexibly combined to suit a wide range of downstream code tasks. Such flexibility is enabled by our proposed mixture of pretraining objectives to mitigate the pretrain-finetune discrepancy. These objectives cover span denoising, contrastive learning, text-code matching, and causal LM pretraining tasks, on both unimodal and bimodal multilingual code corpora. Furthermore, we propose to initialize CodeT5+ with frozen off-the-shelf LLMs without training from scratch to efficiently scale up our models, and explore instruction-tuning to align with natural language instructions. We extensively evaluate CodeT5+ on over 20 code-related benchmarks in different settings, including zero-shot, finetuning, and instruction-tuning. We observe state-of-the-art (SoTA) model performance on various code-related tasks, such as code generation and completion, math programming, and text-to-code retrieval tasks. Particularly, our instruction-tuned CodeT5+ 16B achieves new SoTA results on HumanEval code generation task against other open code LLMs.

READ FULL TEXT
research
01/27/2021

Scheduled Sampling in Vision-Language Pretraining with Decoupled Encoder-Decoder Network

Despite having impressive vision-language (VL) pretraining with BERT-bas...
research
04/12/2022

What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?

Large pretrained Transformer language models have been shown to exhibit ...
research
08/10/2022

CoditT5: Pretraining for Source Code and Natural Language Editing

Pretrained language models have been shown to be effective in many softw...
research
07/05/2022

CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning

Program synthesis or code generation aims to generate a program that sat...
research
06/01/2023

STEVE-1: A Generative Model for Text-to-Behavior in Minecraft

Constructing AI models that respond to text instructions is challenging,...
research
10/03/2022

ContraGen: Effective Contrastive Learning For Causal Language Model

Despite exciting progress in large-scale language generation, the expres...
research
06/09/2022

Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs

To build an artificial neural network like the biological intelligence s...

Please sign up or login with your details

Forgot password? Click here to reset