Concrete Syntax with Black Box Parsers

02/01/2019
by   Rodin Aarssen, et al.
0

Context: Meta programming consists for a large part of matching, analyzing, and transforming syntax trees. Many meta programming systems process abstract syntax trees, but this requires intimate knowledge of the structure of the data type describing the abstract syntax. As a result, meta programming is error-prone, and meta programs are not resilient to evolution of the structure of such ASTs, requiring invasive, fault-prone change to these programs. Inquiry: Concrete syntax patterns alleviate this problem by allowing the meta programmer to match and create syntax trees using the actual syntax of the object language. Systems supporting concrete syntax patterns, however, require a concrete grammar of the object language in their own formalism. Creating such grammars is a costly and error-prone process, especially for realistic languages such as Java and C++. Approach: In this paper we present Concretely, a technique to extend meta programming systems with pluggable concrete syntax patterns, based on external, black box parsers. We illustrate Concretely in the context of Rascal, an open-source meta programming system and language workbench, and show how to reuse existing parsers for Java, JavaScript, and C++. Furthermore, we propose Tympanic, a DSL to declaratively map external AST structures to Rascal's internal data structures. Tympanic allows implementors of Concretely to solve the impedance mismatch between object-oriented class hierarchies in Java and Rascal's algebraic data types. Both the algebraic data type and AST marshalling code is automatically generated. Knowledge: The conceptual architecture of Concretely and Tympanic supports the reuse of pre-existing, external parsers, and their AST representation in meta programming systems that feature concrete syntax patterns for matching and constructing syntax trees. As such this opens up concrete syntax pattern matching for a host of realistic languages for which writing a grammar from scratch is time consuming and error-prone, but for which industry-strength parsers exist in the wild. Grounding: We evaluate Concretely in terms of source lines of code (SLOC), relative to the size of the AST data type and marshalling code. We show that for real programming languages such as C++ and Java, adding support for concrete syntax patterns takes an effort only in the order of dozens of SLOC. Similarly, we evaluate Tympanic in terms of SLOC, showing an order of magnitude of reduction in SLOC compared to manual implementation of the AST data types and marshalling code. Importance: Meta programming has applications in reverse engineering, reengineering, source code analysis, static analysis, software renovation, domain-specific language engineering, and many others. Processing of syntax trees is central to all of these tasks. Concrete syntax patterns improve the practice of constructing meta programs. The combination of Concretely and Tympanic has the potential to make concrete syntax patterns available with very little effort, thereby improving and promoting the application of meta programming in the general software engineering context.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 7

page 12

page 13

page 23

03/23/2021

PSIMiner: A Tool for Mining Rich Abstract Syntax Trees from Code

The application of machine learning algorithms to source code has grown ...
07/13/2021

Mining Idioms in the Wild

Existing code repositories contain numerous instances of code patterns t...
02/14/2020

Reusing Static Analysis across Different Domain-Specific Languages using Reference Attribute Grammars

Context: Domain-specific languages (DSLs) enable domain experts to speci...
10/23/2020

Adding Interactive Visual Syntax to Textual Code

Many programming problems call for turning geometrical thoughts into cod...
08/24/2020

MyPDDL: Tools for efficiently creating PDDL domains and problems

The Planning Domain Definition Language (PDDL) is the state-of-the-art l...
03/27/2018

Towards Zero-Overhead Disambiguation of Deep Priority Conflicts

**Context** Context-free grammars are widely used for language prototypi...
05/06/2019

A Semi-Automatic Approach for Syntax Error Reporting and Recovery in Parsing Expression Grammars

Error recovery is an essential feature for a parser that should be plugg...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.