DeepAI AI Chat
Log In Sign Up

Scalable Semantic Querying of Text

by   Xiaolan Wang, et al.

We present the KOKO system that takes declarative information extraction to a new level by incorporating advances in natural language processing techniques in its extraction language. KOKO is novel in that its extraction language simultaneously supports conditions on the surface of the text and on the structure of the dependency parse tree of sentences, thereby allowing for more refined extractions. KOKO also supports conditions that are forgiving to linguistic variation of expressing concepts and allows to aggregate evidence from the entire document in order to filter extractions. To scale up, KOKO exploits a multi-indexing scheme and heuristics for efficient extractions. We extensively evaluate KOKO over publicly available text corpora. We show that KOKO indices take up the smallest amount of space, are notably faster and more effective than a number of prior indexing schemes. Finally, we demonstrate KOKO's scale up on a corpus of 5 million Wikipedia articles.


page 1

page 2

page 3

page 4


Hedera: Scalable Indexing and Exploring Entities in Wikipedia Revision History

Much of work in semantic web relying on Wikipedia as the main source of ...

Entity Extraction with Knowledge from Web Scale Corpora

Entity extraction is an important task in text mining and natural langua...

WikiReading: A Novel Large-scale Language Understanding Task over Wikipedia

We present WikiReading, a large-scale natural language understanding tas...

MeSHup: A Corpus for Full Text Biomedical Document Indexing

Medical Subject Heading (MeSH) indexing refers to the problem of assigni...

MedJEx: A Medical Jargon Extraction Model with Wiki's Hyperlink Span and Contextualized Masked Language Model Score

This paper proposes a new natural language processing (NLP) application ...

Natural language processing for word sense disambiguation and information extraction

This research work deals with Natural Language Processing (NLP) and extr...