PAC Prediction Sets for Large Language Models of Code

02/17/2023
by   Adam Khakhar, et al.
0

Prediction sets have recently been shown to be a promising strategy for quantifying the uncertainty of deep neural networks in a way that provides theoretical guarantees. However, existing techniques have largely targeted settings where the space of labels is simple, so prediction sets can be arbitrary subsets of labels. For structured prediction problems where the space of labels is exponential in size, even prediction sets containing a small fraction of all labels can be exponentially large. In the context of code generation, we propose a solution that considers a restricted set of prediction sets that can compactly be represented as partial programs, which are programs with portions replaced with holes. Given a trained code generation model, our algorithm leverages a programming language's abstract syntax tree to generate a set of programs such that the correct program is in the set with high-confidence. Valuable applications of our algorithm include a Codex-style code generator with holes in uncertain parts of the generated code, which provides a partial program with theoretical guarantees. We evaluate our approach on PICARD (a T5 model for SQL semantic parsing) and Codex (a GPT model for over a dozen programming languages, including Python), demonstrating that our approach generates compact PAC prediction sets. This is the first research contribution that generates PAC prediction sets for generative code models.

READ FULL TEXT
research
08/27/2021

Lyra: A Benchmark for Turducken-Style Code Generation

Code generation is crucial to reduce manual software development efforts...
research
12/31/2019

PAC Confidence Sets for Deep Neural Networks via Calibrated Prediction

We propose an algorithm combining calibrated prediction and generalizati...
research
09/30/2019

Structural Language Models for Any-Code Generation

We address the problem of Any-Code Generation (AnyGen) - generating code...
research
05/25/2023

Type Prediction With Program Decomposition and Fill-in-the-Type Training

TypeScript and Python are two programming languages that support optiona...
research
05/21/2023

SLaDe: A Portable Small Language Model Decompiler for Optimized Assembler

Decompilation is a well-studied area with numerous high-quality tools av...
research
03/26/2018

A General Path-Based Representation for Predicting Program Properties

Predicting program properties such as names or expression types has a wi...
research
01/26/2023

User-Customizable Transpilation of Scripting Languages

A transpiler converts code from one programming language to another. Man...

Please sign up or login with your details

Forgot password? Click here to reset