Annotated Document Spanners

08/30/2019
by   Johannes Doleschal, et al.
0

We introduce annotated document spanners, which are document spanners that can annotate their output tuples with elements from a semiring. Such spanners are useful for modeling soft constraints, which are popular in practical information extraction tools. We introduce a finite automaton model for such spanners, which generalizes vset-automata and weighted automata, and prove that this model is closed under the relational algebra operations union, projection, natural join that have been considered in the work on provenance in databases. Concerning selection, we generalize a characterization of Fagin et al., proving that a string relation R is recognizable if and only if the regular spanners are closed under selection using R. Finally we consider evaluation and enumeration problems for annotated document spanners and provide a number of tractability- and intractability results. For achieving tractability, fundamental properties of the underlying semiring, such as positivity, are crucial.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/30/2019

Weight Annotation in Information Extraction

The framework of document spanners abstracts the task of information ext...
research
10/26/2020

A Purely Regular Approach to Non-Regular Core Spanners

The regular spanners (characterised by vset-automata) are closed under t...
research
03/14/2018

Constant delay algorithms for regular document spanners

Regular expressions and automata models with capture variables are core ...
research
12/21/2017

Recursive Programs for Document Spanners

A document spanner models a program for Information Extraction (IE) as a...
research
09/25/2022

Constant-delay enumeration for SLP-compressed documents

We study the problem of enumerating results from a query over a compress...
research
01/14/2019

Complexity Bounds for Relational Algebra over Document Spanners

We investigate the complexity of evaluating queries in Relational Algebr...
research
07/10/2018

Streamable Regular Transductions

Motivated by real-time monitoring and data processing applications, we d...

Please sign up or login with your details

Forgot password? Click here to reset