Type Prediction With Program Decomposition and Fill-in-the-Type Training

05/25/2023
by   Federico Cassano, et al.
0

TypeScript and Python are two programming languages that support optional type annotations, which are useful but tedious to introduce and maintain. This has motivated automated type prediction: given an untyped program, produce a well-typed output program. Large language models (LLMs) are promising for type prediction, but there are challenges: fill-in-the-middle performs poorly, programs may not fit into the context window, generated types may not type check, and it is difficult to measure how well-typed the output program is. We address these challenges by building OpenTau, a search-based approach for type prediction that leverages large language models. We propose a new metric for type prediction quality, give a tree-based program decomposition that searches a space of generated types, and present fill-in-the-type fine-tuning for LLMs. We evaluate our work with a new dataset for TypeScript type prediction, and show that 47.4 overall rate of 3.3 type errors per file. All code, data, and models are available at: https://github.com/GammaTauAI/opentau.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/09/2021

Towards Neural Functional Program Evaluation

This paper explores the capabilities of current transformer-based langua...
research
08/16/2021

Program Synthesis with Large Language Models

This paper explores the limits of the current generation of large langua...
research
02/17/2023

PAC Prediction Sets for Large Language Models of Code

Prediction sets have recently been shown to be a promising strategy for ...
research
02/23/2023

Do Machine Learning Models Produce TypeScript Types That Type Check?

Type migration is the process of adding types to untyped code to gain as...
research
09/14/2022

Typesafe Coordinate Systems in High-Throughput Sequencing Applications

High-throughput sequencing file formats and tools encode coordinate inte...
research
05/18/2023

Generalized Planning in PDDL Domains with Pretrained Large Language Models

Recent work has considered whether large language models (LLMs) can func...
research
05/30/2019

MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms

We introduce a large-scale dataset of math word problems and an interpre...

Please sign up or login with your details

Forgot password? Click here to reset