Scalable Semantic Querying of Text

05/03/2018
by   Xiaolan Wang, et al.
0

We present the KOKO system that takes declarative information extraction to a new level by incorporating advances in natural language processing techniques in its extraction language. KOKO is novel in that its extraction language simultaneously supports conditions on the surface of the text and on the structure of the dependency parse tree of sentences, thereby allowing for more refined extractions. KOKO also supports conditions that are forgiving to linguistic variation of expressing concepts and allows to aggregate evidence from the entire document in order to filter extractions. To scale up, KOKO exploits a multi-indexing scheme and heuristics for efficient extractions. We extensively evaluate KOKO over publicly available text corpora. We show that KOKO indices take up the smallest amount of space, are notably faster and more effective than a number of prior indexing schemes. Finally, we demonstrate KOKO's scale up on a corpus of 5 million Wikipedia articles.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/14/2017

Hedera: Scalable Indexing and Exploring Entities in Wikipedia Revision History

Much of work in semantic web relying on Wikipedia as the main source of ...
research
11/21/2019

Entity Extraction with Knowledge from Web Scale Corpora

Entity extraction is an important task in text mining and natural langua...
research
08/11/2016

WikiReading: A Novel Large-scale Language Understanding Task over Wikipedia

We present WikiReading, a large-scale natural language understanding tas...
research
12/09/2020

Simple or Complex? Learning to Predict Readability of Bengali Texts

Determining the readability of a text is the first step to its simplific...
research
10/12/2022

MedJEx: A Medical Jargon Extraction Model with Wiki's Hyperlink Span and Contextualized Masked Language Model Score

This paper proposes a new natural language processing (NLP) application ...
research
04/05/2020

Natural language processing for word sense disambiguation and information extraction

This research work deals with Natural Language Processing (NLP) and extr...

Please sign up or login with your details

Forgot password? Click here to reset