Grammars for Document Spanners

03/15/2020
by   Liat Peterfreund, et al.
0

A new grammar-based language for defining information-extractors from textual content based on the document spanners framework of Fagin et al. is proposed. While studied languages for document spanners are mainly built upon regex formulas, which are regular expressions extended with variables, this new language is based on context-free grammars. The expressiveness of these grammars is compared with previously studied classes of spanners and the complexity of their evaluation is discussed. An enumeration algorithm that outputs the results with constant delay after cubic preprocessing in the input document is presented.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/15/2020

Grammars for Document Spanenrs

A new grammar-based language for defining information-extractors from te...
research
03/14/2018

Constant delay algorithms for regular document spanners

Regular expressions and automata models with capture variables are core ...
research
07/24/2018

Constant-Delay Enumeration for Nondeterministic Document Spanners

We consider the information extraction approach known as document spanne...
research
01/14/2019

Complexity Bounds for Relational Algebra over Document Spanners

We investigate the complexity of evaluating queries in Relational Algebr...
research
01/03/2022

Efficient enumeration algorithms for annotated grammars

We introduce annotated grammars, an extension of context-free grammars w...
research
02/20/2020

The Complexity of Aggregates over Extractions by Regular Expressions

Regular expressions with capture variables, also known as "regex formula...
research
09/14/2022

On the Intersection of Context-Free and Regular Languages

The Bar-Hillel construction is a classic result in formal language theor...

Please sign up or login with your details

Forgot password? Click here to reset