Tuning Models of Code with Compiler-Generated Reinforcement Learning Feedback

05/25/2023
by   Abhinav Jain, et al.
0

Large Language Models (LLMs) pre-trained on code have recently emerged as the dominant approach to program synthesis. However, the code that these models produce can violate basic language-level invariants, leading to lower performance in downstream tasks. We address this issue through an approach, called RLCF, that further trains a pre-trained LLM using feedback from a code compiler. RLCF views the LLM as an RL agent that generates code step by step and receives: (i) compiler-derived feedback on whether the code it generates passes a set of correctness checks; and (ii) feedback from a different LLM on whether the generated code is similar to a set of reference programs in the training corpus. Together, these feedback mechanisms help the generated code remain within the target distribution while passing all static correctness checks. RLCF is model- and language-agnostic. We empirically evaluate it on the MBJP and MathQA tasks for Java. Our experiments show that RLCF significantly raises the odds that an LLM-generated program compiles, is executable, and produces the right output on tests, often allowing LLMs to match the performance of 2x-8x larger LLMs.

READ FULL TEXT

page 7

page 13

page 17

research
07/27/2023

PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback

Large Language Models for Code (Code LLM) are flourishing. New and power...
research
06/05/2023

LexGPT 0.1: pre-trained GPT-J models with Pile of Law

This research aims to build generative language models specialized for t...
research
05/17/2023

LeTI: Learning to Generate from Textual Interactions

Finetuning pre-trained language models (LMs) enhances the models' capabi...
research
09/10/2020

JCoffee: Using Compiler Feedback to Make Partial Code Snippets Compilable

Static program analysis tools are often required to work with only a sma...
research
09/11/2023

Large Language Models for Compiler Optimization

We explore the novel application of Large Language Models to code optimi...
research
07/05/2022

CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning

Program synthesis or code generation aims to generate a program that sat...
research
05/16/2023

Generate Compilers from Hardware Models!

Compiler backends should be automatically generated from hardware design...

Please sign up or login with your details

Forgot password? Click here to reset