Location Name Extraction from Targeted Text Streams using Gazetteer-based Statistical Language Models

08/10/2017
by   Hussein S. Al-Olimat, et al.
0

Extracting location names from informal and unstructured texts requires the identification of referent boundaries and partitioning of compound names in the presence of variation in location referents. Instead of analyzing semantic, syntactic, and/or orthographic features, our Location Name Extraction tool (LNEx) exploits a region-specific statistical language model to evaluate an observed n-gram in Twitter targeted text as a legitimate location name variant. LNEx handles abbreviations, and automatically filters and augments the location names in gazetteers from OpenStreetMap, Geonames, and DBpedia. Consistent with Carroll [4], LNEx addresses two kinds of location name contractions: category ellipsis and location ellipsis, which produces alternate name forms of location names (i.e., Nameheads of location names). The modified gazetteers and dictionaries of abbreviations help detect the boundaries of multi-word location names delimiting them in texts using n-gram statistics. We evaluated the extent to which using an augmented and filtered region-specific gazetteer can successfully extract location names from a targeted text stream. We used 4,500 event-specific tweets from three targeted streams of different flooding disasters to compare LNEx performance against eight state-of-the-art taggers. LNEx improved the average F-Score by 98-145 outperforming these taggers convincingly on the three manually annotated Twitter streams. Furthermore, LNEx is capable of stream processing.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/12/2021

Extraction of Medication Names from Twitter Using Augmentation and an Ensemble of Language Models

The BioCreative VII Track 3 challenge focused on the identification of m...
research
01/24/2019

Location reference identification from tweets during emergencies: A deep learning approach

Twitter is recently being used during crises to communicate with officia...
research
11/16/2020

The Person Index Challenge: Extraction of Persons from Messy, Short Texts

When persons are mentioned in texts with their first name, last name and...
research
01/26/2022

Learning to Recommend Method Names with Global Context

In programming, the names for the program entities, especially for the m...
research
10/06/2015

Language Segmentation

Language segmentation consists in finding the boundaries where one langu...
research
04/18/2022

Extracting Targeted Training Data from ASR Models, and How to Mitigate It

Recent work has designed methods to demonstrate that model updates in AS...
research
01/21/2020

A Hierarchical Location Normalization System for Text

It's natural these days for people to know the local events from massive...

Please sign up or login with your details

Forgot password? Click here to reset