A closer look at TDFA

06/03/2022
by   Angelo Borsotti, et al.
0

We present an algorithm for regular expression parsing and submatch extraction based on tagged deterministic finite automata. The algorithm works with different disambiguation policies. We give detailed pseudocode for the algorithm, covering important practical optimizations. All transformations from a regular expression to an optimized automaton are explained on a step-by-step example. We consider both ahead-of-time and just-in-time determinization and describe variants of the algorithm suited to each setting. We provide benchmarks showing that the algorithm is very fast in practice. Our research is based on two independent implementations: an open-source lexer generator RE2C and an experimental Java library.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/09/2018

From Regular Expression Matching to Parsing

Given a regular expression R and a string Q the regular expression match...
research
08/20/2023

Real-time Regular Expression Matching

This paper is devoted to finite state automata, regular expression match...
research
02/13/2023

Deterministic regular functions of infinite words

Regular functions of infinite words are (partial) functions realized by ...
research
12/29/2020

Canonical Representations of k-Safety Hyperproperties

Hyperproperties elevate the traditional view of trace properties form se...
research
10/15/2020

The LL(finite) strategy for optimal LL(k) parsing

The LL(finite) parsing strategy for parsing of LL(k) grammars where k ne...
research
08/05/2020

Glushkov's construction for functional subsequential transducers

Glushkov's construction has many interesting properties and they become ...
research
02/04/2019

Active Automata Learning with Adaptive Distinguishing Sequences

This document investigates the integration of adaptive distinguishing se...

Please sign up or login with your details

Forgot password? Click here to reset