ASET: Ad-hoc Structured Exploration of Text Collections [Extended Abstract]

03/09/2022
by   Benjamin Hättasch, et al.
3

In this paper, we propose a new system called ASET that allows users to perform structured explorations of text collections in an ad-hoc manner. The main idea of ASET is to use a new two-phase approach that first extracts a superset of information nuggets from the texts using existing extractors such as named entity recognizers and then matches the extractions to a structured table definition as requested by the user based on embeddings. In our evaluation, we show that ASET is thus able to extract structured data from real-world text collections in high quality without the need to design extraction pipelines upfront.

READ FULL TEXT

page 1

page 2

research
02/16/2022

A Survey of Ad Hoc Teamwork: Definitions, Methods, and Open Problems

Ad hoc teamwork is the well-established research problem of designing ag...
research
10/04/2021

Structured abbreviation expansion in context

Ad hoc abbreviations are commonly found in informal communication channe...
research
01/24/2022

HC4: A New Suite of Test Collections for Ad Hoc CLIR

HC4 is a new suite of test collections for ad hoc Cross-Language Informa...
research
05/01/2010

Joint Structured Models for Extraction from Overlapping Sources

We consider the problem of jointly training structured models for extrac...
research
09/06/2021

Text-to-Table: A New Way of Information Extraction

We study a new problem setting of information extraction (IE), referred ...
research
09/28/2021

PSI: Constructing ad-hoc Simplices to Interpolate High-Dimensional Unstructured Data

Interpolating unstructured data using barycentric coordinates becomes in...
research
03/02/2018

Unifacta: Profiling-driven String Pattern Standardization

Data cleaning is critical for effective data analytics on many real-worl...

Please sign up or login with your details

Forgot password? Click here to reset