Split-Correctness in Information Extraction

10/08/2018
by   Johannes Doleschal, et al.
0

Programs for extracting structured information from text, namely information extractors, often operate separately on document segments obtained from a generic splitting operation such as sentences, paragraphs, k-grams, HTTP requests, and so on. An automated detection of this behavior of extractors, which we refer to as split-correctness, would allow text analysis systems to devise query plans with parallel evaluation on segments for accelerating the processing of large documents. Other applications include the incremental evaluation on dynamic content, where re-evaluation of information extractors can be restricted to revised segments, and debugging, where developers of information extractors are informed about potential boundary crossing of different semantic components. We propose a new formal framework for split-correctness within the formalism of document spanners. Our preliminary analysis studies the complexity of split-correctness over regular spanners. We also discuss different variants of split-correctness, for instance, in the presence of black-box extractors with so-called split constraints.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/13/2022

A New Optimality Property of Strang's Splitting

For systems of the form q̇ = M^-1 p, ṗ = -Aq+f(q), common in many applic...
research
07/30/2019

English-Czech Systems in WMT19: Document-Level Transformer

We describe our NMT systems submitted to the WMT19 shared task in Englis...
research
06/10/2013

The Orthogonal 2D Planes Split of Quaternions and Steerable Quaternion Fourier Transformations

The two-sided quaternionic Fourier transformation (QFT) was introduced i...
research
11/09/2022

DoSA : A System to Accelerate Annotations on Business Documents with Human-in-the-Loop

Business documents come in a variety of structures, formats and informat...
research
02/08/2022

K-ST: A Formal Executable Semantics of PLC Structured Text Language

Programmable Logic Controllers (PLCs) are responsible for automating pro...
research
09/09/2013

The Linearized Bregman Method via Split Feasibility Problems: Analysis and Generalizations

The linearized Bregman method is a method to calculate sparse solutions ...

Please sign up or login with your details

Forgot password? Click here to reset