Measurement Context Extraction from Text: Discovering Opportunities and Gaps in Earth Science

10/11/2017
by   Kyle Hundman, et al.
0

We propose Marve, a system for extracting measurement values, units, and related words from natural language text. Marve uses conditional random fields (CRF) to identify measurement values and units, followed by a rule-based system to find related entities, descriptors and modifiers within a sentence. Sentence tokens are represented by an undirected graphical model, and rules are based on part-of-speech and word dependency patterns connecting values and units to contextual words. Marve is unique in its focus on measurement context and early experimentation demonstrates Marve's ability to generate high-precision extractions with strong recall. We also discuss Marve's role in refining measurement requirements for NASA's proposed HyspIRI mission, a hyperspectral infrared imaging satellite that will study the world's ecosystems. In general, our work with HyspIRI demonstrates the value of semantic measurement extractions in characterizing quantitative discussion contained in large corpuses of natural language text. These extractions accelerate broad, cross-cutting research and expose scientists new algorithmic approaches and experimental nuances. They also facilitate identification of scientific opportunities enabled by HyspIRI leading to more efficient scientific investment and research.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/13/2022

SciNLI: A Corpus for Natural Language Inference on Scientific Text

Existing Natural Language Inference (NLI) datasets, while being instrume...
research
01/30/2023

UzbekTagger: The rule-based POS tagger for Uzbek language

This research paper presents a part-of-speech (POS) annotated dataset an...
research
11/23/2021

Identifying the Units of Measurement in Tabular Data

We consider the problem of identifying the units of measurement in a dat...
research
10/26/2018

Static and Dynamic Vector Semantics for Lambda Calculus Models of Natural Language

Vector models of language are based on the contextual aspects of languag...
research
09/07/2022

Non-Standard Vietnamese Word Detection and Normalization for Text-to-Speech

Converting written texts into their spoken forms is an essential problem...
research
05/09/2022

XSTEM: An exemplar-based stemming algorithm

Stemming is the process of reducing related words to a standard form by ...
research
05/05/2015

Mining Measured Information from Text

We present an approach to extract measured information from text (e.g., ...

Please sign up or login with your details

Forgot password? Click here to reset