CQE: A Comprehensive Quantity Extractor

05/15/2023
by   Satya Almasian, et al.
0

Quantities are essential in documents to describe factual information. They are ubiquitous in application domains such as finance, business, medicine, and science in general. Compared to other information extraction approaches, interestingly only a few works exist that describe methods for a proper extraction and representation of quantities in text. In this paper, we present such a comprehensive quantity extraction framework from text data. It efficiently detects combinations of values and units, the behavior of a quantity (e.g., rising or falling), and the concept a quantity is associated with. Our framework makes use of dependency parsing and a dictionary of units, and it provides for a proper normalization and standardization of detected quantities. Using a novel dataset for evaluation, we show that our open source framework outperforms other systems and – to the best of our knowledge – is the first to detect concepts associated with identified quantities. The code and data underlying our framework are available at https://github.com/vivkaz/CQE.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/21/2023

KitchenScale: Learning to predict ingredient quantities from recipe contexts

Determining proper quantities for ingredients is an essential part of co...
research
02/15/2023

Automated Reasoning for Physical Quantities, Units, and Measurements in Isabelle/HOL

Formal verification of cyber-physical and robotic systems requires that ...
research
07/08/2022

Lessons from Deep Learning applied to Scholarly Information Extraction: What Works, What Doesn't, and Future Directions

Understanding key insights from full-text scholarly articles is essentia...
research
08/31/2019

Quantity Tagger: A Latent-Variable Sequence Labeling Approach to Solving Addition-Subtraction Word Problems

An arithmetic word problem typically includes a textual description cont...
research
05/05/2015

Mining Measured Information from Text

We present an approach to extract measured information from text (e.g., ...
research
04/28/2023

CED: Catalog Extraction from Documents

Sentence-by-sentence information extraction from long documents is an ex...
research
10/22/2022

A Discipline of Programming with Quantities

In scientific and engineering applications, physical quantities embodied...

Please sign up or login with your details

Forgot password? Click here to reset