CodeTF: One-stop Transformer Library for State-of-the-art Code LLM

05/31/2023
by   Nghi D. Q. Bui, et al.
0

Code intelligence plays a key role in transforming modern software engineering. Recently, deep learning-based models, especially Transformer-based large language models (LLMs), have demonstrated remarkable potential in tackling these tasks by leveraging massive open-source code data and programming language features. However, the development and deployment of such models often require expertise in both machine learning and software engineering, creating a barrier for the model adoption. In this paper, we present CodeTF, an open-source Transformer-based library for state-of-the-art Code LLMs and code intelligence. Following the principles of modular design and extensible framework, we design CodeTF with a unified interface to enable rapid access and development across different types of models, datasets and tasks. Our library supports a collection of pretrained Code LLM models and popular code benchmarks, including a standardized interface to train and serve code LLMs efficiently, and data features such as language-specific parsers and utility functions for extracting code attributes. In this paper, we describe the design principles, the architecture, key modules and components, and compare with other related library tools. Finally, we hope CodeTF is able to bridge the gap between machine learning/generative AI and software engineering, providing a comprehensive open-source solution for developers, researchers, and practitioners.

READ FULL TEXT

page 2

page 4

page 7

research
12/20/2022

A Survey on Pretrained Language Models for Neural Code Intelligence

As the complexity of modern software continues to escalate, software eng...
research
05/19/2023

Chemellia: An Ecosystem for Atomistic Scientific Machine Learning

Chemellia is an open-source framework for atomistic machine learning in ...
research
07/10/2022

Open-source software for electrical engineering applications requiring consideration of electrodynamics: elecode

The work presents elecode, open-source software for various electrical e...
research
07/10/2023

COMEX: A Tool for Generating Customized Source Code Representations

Learning effective representations of source code is critical for any Ma...
research
08/20/2021

Fex: Assisted Identification of Domain Features from C Programs

Modern software typically performs more than one functionality. These fu...
research
11/04/2019

Learning based Methods for Code Runtime Complexity Prediction

Predicting the runtime complexity of a programming code is an arduous ta...
research
09/19/2023

A Configurable Library for Generating and Manipulating Maze Datasets

Understanding how machine learning models respond to distributional shif...

Please sign up or login with your details

Forgot password? Click here to reset