Code Execution with Pre-trained Language Models

05/08/2023
by   Chenxiao Liu, et al.
0

Code execution is a fundamental aspect of programming language semantics that reflects the exact behavior of the code. However, most pre-trained models for code intelligence ignore the execution trace and only rely on source code and syntactic structures. In this paper, we investigate how well pre-trained models can understand and perform code execution. We develop a mutation-based data augmentation technique to create a large-scale and realistic Python dataset and task for code execution, which challenges existing models such as Codex. We then present CodeExecutor, a Transformer model that leverages code execution pre-training and curriculum learning to enhance its semantic comprehension. We evaluate CodeExecutor on code execution and show its promising performance and limitations. We also demonstrate its potential benefits for code intelligence tasks such as zero-shot code-to-code search and text-to-code generation. Our analysis provides insights into the learning and generalization abilities of pre-trained models for code execution.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/13/2023

TRACED: Execution-aware Pre-training for Source Code

Most existing pre-trained language models for source code focus on learn...
research
09/17/2020

GraphCodeBERT: Pre-training Code Representations with Data Flow

Pre-trained models for programming language have achieved dramatic empir...
research
06/17/2022

Evaluation of Contrastive Learning with Various Code Representations for Code Clone Detection

Code clones are pairs of code snippets that implement similar functional...
research
09/04/2023

Towards Foundational AI Models for Additive Manufacturing: Language Models for G-Code Debugging, Manipulation, and Comprehension

3D printing or additive manufacturing is a revolutionary technology that...
research
02/08/2023

GPTScore: Evaluate as You Desire

Generative Artificial Intelligence (AI) has enabled the development of s...
research
08/31/2023

BioCoder: A Benchmark for Bioinformatics Code Generation with Contextual Pragmatic Knowledge

Pre-trained language models like ChatGPT have significantly improved cod...
research
03/16/2023

SemDeDup: Data-efficient learning at web-scale through semantic deduplication

Progress in machine learning has been driven in large part by massive in...

Please sign up or login with your details

Forgot password? Click here to reset