Unified Pre-training for Program Understanding and Generation

03/10/2021
by   Wasi Uddin Ahmad, et al.
0

Code summarization and generation empower conversion between programming language (PL) and natural language (NL), while code translation avails the migration of legacy code from one PL to another. This paper introduces PLBART, a sequence-to-sequence model capable of performing a broad spectrum of program and language understanding and generation tasks. PLBART is pre-trained on an extensive collection of Java and Python functions and associated NL text via denoising autoencoding. Experiments on code summarization in the English language, code generation, and code translation in seven programming languages show that PLBART outperforms or rivals state-of-the-art models. Moreover, experiments on discriminative tasks, e.g., program repair, clone detection, and vulnerable code detection, demonstrate PLBART's effectiveness in program understanding. Furthermore, analysis reveals that PLBART learns program syntax, style (e.g., identifier naming convention), logical flow (e.g., if block inside an else block is equivalent to else if block) that are crucial to program semantics and thus excels even with limited annotations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/23/2022

Summarize and Generate to Back-translate: Unsupervised Translation of Programming Languages

Back-translation is widely known for its effectiveness for neural machin...
research
05/18/2021

CoTexT: Multi-task Learning with Code-Text Transformer

We present CoTexT, a pre-trained, transformer-based encoder-decoder mode...
research
09/20/2023

Design of Chain-of-Thought in Math Problem Solving

Chain-of-Thought (CoT) plays a crucial role in reasoning for math proble...
research
05/24/2023

SAGA: Summarization-Guided Assert Statement Generation

Generating meaningful assert statements is one of the key challenges in ...
research
03/29/2021

Embedding API Dependency Graph for Neural Code Generation

The problem of code generation from textual program descriptions has lon...
research
05/23/2023

Understanding Programs by Exploiting (Fuzzing) Test Cases

Semantic understanding of programs has attracted great attention in the ...
research
02/21/2023

On ML-Based Program Translation: Perils and Promises

With the advent of new and advanced programming languages, it becomes im...

Please sign up or login with your details

Forgot password? Click here to reset