Detecting Opportunities for Differential Maintenance of Extracted Views

07/04/2020
by   Besat Kassaie, et al.
0

Semi-structured and unstructured data management is challenging, but many of the problems encountered are analogous to problems already addressed in the relational context. In the area of information extraction, for example, the shift from engineering ad hoc, application-specific extraction rules towards using expressive languages such as CPSL and AQL creates opportunities to propose solutions that can be applied to a wide range of extraction programs. In this work, we focus on extracted view maintenance, a problem that is well-motivated and thoroughly addressed in the relational setting. In particular, we formalize and address the problem of keeping extracted relations consistent with source documents that can be arbitrarily updated. We formally characterize three classes of document updates, namely those that are irrelevant, autonomously computable, and pseudo-irrelevant with respect to a given extractor. Finally, we propose algorithms to detect pseudo-irrelevant document updates with respect to extractors that are expressed as document spanners, a model of information extraction inspired by SystemT.

READ FULL TEXT

page 1

page 3

page 5

page 9

page 11

page 15

page 17

research
03/30/2022

DBSP: Automatic Incremental View Maintenance for Rich Query Languages

Incremental view maintenance has been for a long time a central problem ...
research
06/20/2022

Business Document Information Extraction: Towards Practical Benchmarks

Information extraction from semi-structured documents is crucial for fri...
research
11/27/2018

A Concept-Centered Hypertext Approach to Case-Based Retrieval

The goal of case-based retrieval is to assist physicians in the clinical...
research
01/14/2018

DCDistance: A Supervised Text Document Feature extraction based on class labels

Text Mining is a field that aims at extracting information from textual ...
research
12/11/2018

Deep Reader: Information extraction from Document images via relation extraction and Natural Language

Recent advancements in the area of Computer Vision with state-of-art Neu...
research
07/04/2017

Document Spanners for Extracting Incomplete Information: Expressiveness and Complexity

Rule-based information extraction has lately received a fair amount of a...
research
09/30/2013

Semi-structured data extraction and modelling: the WIA Project

Over the last decades, the amount of data of all kinds available electro...

Please sign up or login with your details

Forgot password? Click here to reset