From Regular Expression Matching to Parsing

04/09/2018
by   Philip Bille, et al.
0

Given a regular expression R and a string Q the regular expression matching problem is to determine if Q is a member of the language generated by R. The classic textbook algorithm by Thompson [C. ACM 1968] constructs and simulates a non-deterministic finite automaton in O(nm) time and O(m) space, where n and m are the lengths of the string and the regular expression, respectively. Assuming the strong exponential time hypothesis Backurs and Indyk [FOCS 2016] showed that this result is nearly optimal. However, for most applications determining membership is insufficient and we need to compute how we match, i.e., to identify or replace matches or submatches in the string. Using backtracking we can extend Thompson's algorithm to solve this problem, called regular expression parsing, in the same asymptotic time but with a blow up in space to Ω(nm). Surprisingly, all existing approaches suffer the same or a similar quadratic blow up in space and no known solutions for regular expression parsing significantly improve this gap between matching and parsing. In this paper, we overcome this gap and present a new algorithm for regular expression parsing using O(nm) time and O(n + m) space. To achieve our result, we develop a novel divide and conquer approach similar in spirit to the classic divide and conquer technique by Hirshberg [C. ACM 1975] for computing a longest common subsequence of two strings in quadratic time and linear space. We show how to carefully decompose the problem to handle cyclic interactions in the automaton leading to a subproblem construction of independent interest. Finally, we generalize our techniques to convert other existing state-set transition algorithms for matching to parsing using only linear space.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/10/2019

Sparse Regular Expression Matching

We present the first algorithm for regular expression matching that can ...
research
06/03/2022

A closer look at TDFA

We present an algorithm for regular expression parsing and submatch extr...
research
02/16/2018

Online LZ77 Parsing and Matching Statistics with RLBWTs

Lempel-Ziv 1977 (LZ77) parsing, matching statistics and the Burrows-Whee...
research
12/14/2020

A New Approach to Regular Indeterminate Strings

In this paper we propose a new, more appropriate definition of regular a...
research
08/06/2020

Fine-Grained Complexity of Regular Expression Pattern Matching and Membership

The currently fastest algorithm for regular expression pattern matching ...
research
12/18/2018

CPEG: A Typed Tree Construction from Parsing Expression Grammars with Regex-Like Captures

CPEG is an extended parsing expression grammar with regex-like capture a...

Please sign up or login with your details

Forgot password? Click here to reset