An Exploratory Study of Ad Hoc Parsers in Python

04/19/2023
by   Michael Schröder, et al.
0

Background: Ad hoc parsers are pieces of code that use common string functions like split, trim, or slice to effectively perform parsing. Whether it is handling command-line arguments, reading configuration files, parsing custom file formats, or any number of other minor string processing tasks, ad hoc parsing is ubiquitous – yet poorly understood. Objective: This study aims to reveal the common syntactic and semantic characteristics of ad hoc parsing code in real world Python projects. Our goal is to understand the nature of ad hoc parsers in order to inform future program analysis efforts in this area. Method: We plan to conduct an exploratory study based on large-scale mining of open-source Python repositories from GitHub. We will use program slicing to identify program fragments related to ad hoc parsing and analyze these parsers and their surrounding contexts across 9 research questions using 25 initial syntactic and semantic metrics. Beyond descriptive statistics, we will attempt to identify common parsing patterns by cluster analysis.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/24/2020

A Mechanised Semantics for HOL with Ad-hoc Overloading

Isabelle/HOL augments classical higher-order logic with ad-hoc overloadi...
research
03/28/2023

Dias: Dynamic Rewriting of Pandas Code

In recent years, dataframe libraries, such as pandas have exploded in po...
research
06/03/2019

On Modelling the Avoidability of Patterns as CSP

Solving avoidability problems in the area of string combinatorics often ...
research
02/02/2022

Grammars for Free: Toward Grammar Inference for Ad Hoc Parsers

Ad hoc parsers are everywhere: they appear any time a string is split, l...
research
03/02/2018

Unifacta: Profiling-driven String Pattern Standardization

Data cleaning is critical for effective data analytics on many real-worl...
research
07/10/2021

Assessing Data Efficiency in Task-Oriented Semantic Parsing

Data efficiency, despite being an attractive characteristic, is often ch...
research
05/11/2023

Semantic uncertainty guides the extension of conventions to new referents

A long tradition of studies in psycholinguistics has examined the format...

Please sign up or login with your details

Forgot password? Click here to reset