Practical LR Parser Generation

09/17/2022
by   Joe Zimmerman, et al.
0

Parsing is a fundamental building block in modern compilers, and for industrial programming languages, it is a surprisingly involved task. There are known approaches to generate parsers automatically, but the prevailing consensus is that automatic parser generation is not practical for real programming languages: LR/LALR parsers are considered to be far too restrictive in the grammars they support, and LR parsers are often considered too inefficient in practice. As a result, virtually all modern languages use recursive-descent parsers written by hand, a lengthy and error-prone process that dramatically increases the barrier to new programming language development. In this work we demonstrate that, contrary to the prevailing consensus, we can have the best of both worlds: for a very general, practical class of grammars – a strict superset of Knuth's canonical LR – we can generate parsers automatically, and the resulting parser code, as well as the generation procedure itself, is highly efficient. This advance relies on several new ideas, including novel automata optimization procedures; a new grammar transformation ("CPS"); per-symbol attributes; recursive-descent actions; and an extension of canonical LR parsing, which we refer to as XLR, which endows shift/reduce parsers with the power of bounded nondeterministic choice. With these ingredients, we can automatically generate efficient parsers for virtually all programming languages that are intuitively easy to parse – a claim we support experimentally, by implementing the new algorithms in a new software tool called langcc, and running them on syntax specifications for Golang 1.17.8 and Python 3.9.12. The tool handles both languages automatically, and the generated code, when run on standard codebases, is 1.2x faster than the corresponding hand-written parser for Golang, and 4.3x faster than the CPython parser, respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/17/2022

langcc: A Next-Generation Compiler Compiler

Traditionally, parsing has been a laborious and error-prone component of...
research
09/05/2023

Parsing Fortran-77 with proprietary extensions

Far from the latest innovations in software development, many organizati...
research
04/03/2022

MSCCD: Grammar Pluggable Clone Detection Based on ANTLR Parser Generation

For various reasons, programming languages continue to multiply and evol...
research
02/10/2012

Visual definition of procedures for automatic virtual scene generation

With more and more digital media, especially in the field of virtual rea...
research
05/06/2019

A Semi-Automatic Approach for Syntax Error Reporting and Recovery in Parsing Expression Grammars

Error recovery is an essential feature for a parser that should be plugg...
research
10/16/2020

It was never about the language: paradigm impact on software design decisions

Programming languages development has intensified in recent years. New o...
research
07/16/2021

A method for decompilation of AMD GCN kernels to OpenCL

Introduction: Decompilers are useful tools for software analysis and sup...

Please sign up or login with your details

Forgot password? Click here to reset