Document Spanners for Extracting Incomplete Information: Expressiveness and Complexity

07/04/2017
by   Francisco Maturana, et al.
0

Rule-based information extraction has lately received a fair amount of attention from the database community, with several languages appearing in the last few years. Although information extraction systems are intended to deal with semistructured data, all language proposals introduced so far are designed to output relations, thus making them incapable of handling incomplete information. To remedy the situation, we propose to extend information extraction languages with the ability to use mappings, thus allowing us to work with documents which have missing or optional parts. Using this approach, we simplify the semantics of regex formulas and extraction rules, two previously defined methods for extracting information, extend them with the ability to handle incomplete data, and study how they compare in terms of expressive power. We also study computational properties of these languages, focusing on the query enumeration problem, as well as satisfiability and containment.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/14/2022

Layout-Aware Information Extraction for Document-Grounded Dialogue: Dataset, Method and Demonstration

Building document-grounded dialogue systems have received growing intere...
research
11/19/2014

Existential Rule Languages with Finite Chase: Complexity and Expressiveness

Finite chase, or alternatively chase termination, is an important condit...
research
04/10/2020

On Multiple Semantics for Declarative Database Repairs

We study the problem of database repairs through a rule-based framework ...
research
08/07/2018

The Window Validity Problem in Rule-Based Stream Reasoning

Rule-based temporal query languages provide the expressive power and fle...
research
10/15/2021

Integrating diverse extraction pathways using iterative predictions for Multilingual Open Information Extraction

In this paper we investigate a simple hypothesis for the Open Informatio...
research
09/12/2023

Games and Argumentation: Time for a Family Reunion!

The rule "defeated(X) ← attacks(Y,X), defeated(Y)" states that an argum...
research
07/04/2020

Detecting Opportunities for Differential Maintenance of Extracted Views

Semi-structured and unstructured data management is challenging, but man...

Please sign up or login with your details

Forgot password? Click here to reset