Large Scale Generation of Labeled Type Data for Python

01/28/2022
by   Ibrahim Abdelaziz, et al.
0

Recently, dynamically typed languages, such as Python, have gained unprecedented popularity. Although these languages alleviate the need for mandatory type annotations, types still play a critical role in program understanding and preventing runtime errors. An attractive option is to infer types automatically to get static guarantees without writing types. Existing inference techniques rely mostly on static typing tools such as PyType for direct type inference; more recently, neural type inference has been proposed. However, neural type inference is data hungry, and depends on collecting labeled data based on static typing. Such tools, however, are poor at inferring user defined types. Furthermore, type annotation by developers in these languages is quite sparse. In this work, we propose novel techniques for generating high quality types using 1) information retrieval techniques that work on well documented libraries to extract types and 2) usage patterns by analyzing a large repository of programs. Our results show that these techniques are more precise and address the weaknesses of static tools, and can be useful for generating a large labeled dataset for type inference by machine learning methods. F1 scores are 0.52-0.58 for our techniques, compared to static typing tools which are at 0.06, and we use them to generate over 37,000 types for over 700 modules.

READ FULL TEXT
research
06/27/2021

PYInfer: Deep Learning Semantic Type Inference for Python Variables

Python type inference is challenging in practice. Due to its dynamic pro...
research
04/06/2019

Type-Level Computations for Ruby Libraries

Many researchers have explored ways to bring static typing to dynamic la...
research
01/12/2021

Type4Py: Deep Similarity Learning-Based Type Inference for Python

Dynamic languages, such as Python and Javascript, trade static typing fo...
research
04/06/2020

Typilus: Neural Type Hints

Type inference over partial contexts in dynamically typed languages is c...
research
09/13/2020

Advanced Graph-Based Deep Learning for Probabilistic Type Inference

Dynamically typed languages such as JavaScript and Python have emerged a...
research
08/04/2023

TIPICAL – Type Inference for Python In Critical Accuracy Level

Type inference methods based on deep learning are becoming increasingly ...
research
08/28/2020

Effectiveness of Annotation-Based Static Type Inference

Benefits of static type systems are well-known: they offer guarantees th...

Please sign up or login with your details

Forgot password? Click here to reset