Pika parsing: reformulating packrat parsing as a dynamic programming algorithm solves the left recursion and error recovery problems

05/13/2020
by   Luke A. D. Hutchison, et al.
0

A recursive descent parser is built from a set of mutually-recursive functions, where each function directly implements one of the nonterminals of a grammar. A packrat parser uses memoization to reduce the time complexity for recursive descent parsing from exponential to linear in the length of the input. Recursive descent parsers are extremely simple to write, but suffer from two significant problems: (i) left-recursive grammars cause the parser to get stuck in infinite recursion, and (ii) it can be difficult or impossible to optimally recover the parse state and continue parsing after a syntax error. Both problems are solved by the pika parser, a novel reformulation of packrat parsing as a dynamic programming algorithm, which requires parsing the input in reverse: bottom-up and right to left, rather than top-down and left to right. This reversed parsing order enables pika parsers to handle grammars that use either direct or indirect left recursion to achieve left associativity, simplifying grammar writing, and also enables optimal recovery from syntax errors, which is a crucial property for IDEs and compilers. Pika parsing maintains the linear-time performance characteristics of packrat parsing as a function of input length. The pika parser was benchmarked against the widely-used Parboiled2 and ANTLR parsing libraries, and the pika parser performed significantly better than the other parsers for an expression grammar, although for a complex grammar implementing the Java language specification, a large constant performance impact was incurred per input character. Therefore, if performance is important, pika parsing is mostly useful for simple to moderate-sized grammars, or for very large inputs, when other parser alternatives do not scale linearly in the length of the input. Several new insights into precedence, associativity, and left recursion are presented.

READ FULL TEXT
research
05/13/2020

Pika parsing: parsing in reverse solves the left recursion and error recovery problems

A recursive descent parser is built from a set of mutually-recursive fun...
research
08/28/2019

Eliminating Left Recursion without the Epsilon

The standard algorithm to eliminate indirect left recursion takes a prev...
research
06/28/2018

Syntax Error Recovery in Parsing Expression Grammars

Parsing Expression Grammars (PEGs) are a formalism used to describe top-...
research
05/06/2019

A Semi-Automatic Approach for Syntax Error Reporting and Recovery in Parsing Expression Grammars

Error recovery is an essential feature for a parser that should be plugg...
research
10/30/2020

Lake symbols for island parsing

Context: An island parser reads an input text and builds the parse (or a...
research
10/15/2020

The LL(finite) strategy for optimal LL(k) parsing

The LL(finite) parsing strategy for parsing of LL(k) grammars where k ne...
research
11/19/2014

Type-Driven Incremental Semantic Parsing with Polymorphism

Semantic parsing has made significant progress, but most current semanti...

Please sign up or login with your details

Forgot password? Click here to reset