LeTI: Learning to Generate from Textual Interactions

05/17/2023
by   Xingyao Wang, et al.
0

Finetuning pre-trained language models (LMs) enhances the models' capabilities. Prior techniques fine-tune a pre-trained LM on input-output pairs (e.g., instruction fine-tuning), or with numerical rewards that gauge the quality of its outputs (e.g., reinforcement learning from human feedback). We explore LMs' potential to learn from textual interactions (LeTI) that not only check their correctness with binary labels, but also pinpoint and explain errors in their outputs through textual feedback. Our investigation focuses on the code generation task, where the model produces code pieces in response to natural language instructions. This setting invites a natural and scalable way to acquire the textual feedback: the error messages and stack traces from code execution using a Python interpreter. LeTI iteratively fine-tunes the model, using the LM objective, on a concatenation of natural language instructions, LM-generated programs, and textual feedback, which is only provided when the generated program fails to solve the task. Prepended to this fine-tuning text, a binary reward token is used to differentiate correct and buggy solutions. On MBPP, a code generation dataset, LeTI substantially improves the performance of two base LMs of different scales. LeTI requires no ground-truth outputs for training and even outperforms a fine-tuned baseline that does. LeTI's strong performance generalizes to other datasets. Trained on MBPP, it achieves comparable or better performance than the base LMs on unseen problems in HumanEval. Furthermore, compared to binary feedback, we observe that textual feedback leads to improved generation quality and sample efficiency, achieving the same performance with fewer than half of the gradient steps. LeTI is equally applicable in natural language tasks when they can be formulated as code generation, which we empirically verified on event argument extraction.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/28/2023

Improving Code Generation by Training with Natural Language Feedback

The potential for pre-trained large language models (LLMs) to use natura...
research
08/16/2021

Program Synthesis with Large Language Models

This paper explores the limits of the current generation of large langua...
research
05/11/2021

Mandating Code Disclosure is Unnecessary – Strict Model Verification Does Not Require Accessing Original Computer Code

Mandating public availability of computer code underlying computational ...
research
05/25/2023

Tuning Models of Code with Compiler-Generated Reinforcement Learning Feedback

Large Language Models (LLMs) pre-trained on code have recently emerged a...
research
01/26/2022

Synchromesh: Reliable code generation from pre-trained language models

Large pre-trained language models have been used to generate code,provid...
research
04/24/2023

Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model

The immense scale of the recent large language models (LLM) allows many ...
research
08/22/2023

Towards an On-device Agent for Text Rewriting

Large Language Models (LLMs) have demonstrated impressive capabilities f...

Please sign up or login with your details

Forgot password? Click here to reset