NaturalCC: A Toolkit to Naturalize the Source Code Corpus

12/06/2020
by   Yao Wan, et al.
0

We present NaturalCC, an efficient and extensible toolkit to bridge the gap between natural language and programming language, and facilitate the research on big code analysis. Using NaturalCC, researchers both from natural language or programming language communities can quickly and easily reproduce the state-of-the-art baselines and implement their approach. NaturalCC is built upon Fairseq and PyTorch, providing (1) an efficient computation with multi-GPU and mixed-precision data processing for fast model training, (2) a modular and extensible framework that makes it easy to reproduce or implement an approach for big code analysis, and (3) a command line interface and a graphical user interface to demonstrate each model's performance. Currently, we have included several state-of-the-art baselines across different tasks (e.g., code completion, code comment generation, and code retrieval) for demonstration. The video of this demo is available at https://www.youtube.com/watch?v=q4W5VSI-u3E t=25s.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/16/2021

AugmentedCode: Examining the Effects of Natural Language Resources in Code Retrieval Models

Code retrieval is allowing software engineers to search codes through a ...
research
04/20/2020

Incorporating External Knowledge through Pre-training for Natural Language to Code Generation

Open-domain code generation aims to generate code in a general-purpose p...
research
10/03/2021

DeepSCC: Source Code Classification Based on Fine-Tuned RoBERTa

In software engineering-related tasks (such as programming language tag ...
research
08/28/2023

CodeMark: Imperceptible Watermarking for Code Datasets against Neural Code Completion Models

Code datasets are of immense value for training neural-network-based cod...
research
02/19/2023

SanskritShala: A Neural Sanskrit NLP Toolkit with Web-Based Interface for Pedagogical and Annotation Purposes

We present a neural Sanskrit Natural Language Processing (NLP) toolkit n...
research
08/11/2021

Natural Language-Guided Programming

In today's software world with its cornucopia of reusable software libra...
research
07/13/2022

DocCoder: Generating Code by Retrieving and Reading Docs

Natural-language-to-code models learn to generate a code snippet given a...

Please sign up or login with your details

Forgot password? Click here to reset