PyMT5: multi-mode translation of natural language and Python code with transformers

10/07/2020
by   Colin B. Clement, et al.
0

Simultaneously modeling source code and natural language has many exciting applications in automated software development and understanding. Pursuant to achieving such technology, we introduce PyMT5, the Python method text-to-text transfer transformer, which is trained to translate between all pairs of Python method feature combinations: a single model that can both predict whole methods from natural language documentation strings (docstrings) and summarize code into docstrings of any common style. We present an analysis and modeling effort of a large-scale parallel corpus of 26 million Python methods and 7.7 million method-docstring pairs, demonstrating that for docstring and method generation, PyMT5 outperforms similarly-sized auto-regressive language models (GPT2) which were English pre-trained or randomly initialized. On the CodeSearchNet test set, our best model predicts 92.1 achieved a BLEU score of 8.59 for method generation and 16.3 for docstring generation (summarization), and achieved a ROUGE-L F-score of 24.8 for method generation and 36.7 for docstring generation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/07/2017

A parallel corpus of Python functions and documentation strings for automated code documentation and code generation

Automated documentation of programming source code and automated code ge...
research
01/18/2021

Teach me how to Label: Labeling Functions from Natural Language with Text-to-text Transformers

Annotated data has become the most important bottleneck in training accu...
research
02/21/2021

Automatic Code Generation using Pre-Trained Language Models

Recent advancements in natural language processing <cit.> <cit.> have le...
research
01/19/2022

GAP-Gen: Guided Automatic Python Code Generation

Automatic code generation from natural language descriptions can be high...
research
05/29/2021

CoDesc: A Large Code-Description Parallel Dataset

Translation between natural language and source code can help software d...
research
05/23/2018

Learning to Mine Aligned Code and Natural Language Pairs from Stack Overflow

For tasks like code synthesis from natural language, code retrieval, and...
research
07/17/2023

A Lightweight Framework for High-Quality Code Generation

In recent years, the use of automated source code generation utilizing t...

Please sign up or login with your details

Forgot password? Click here to reset