Inferring the location of authors from words in their texts

12/20/2016
by   Max Berggren, et al.
0

For the purposes of computational dialectology or other geographically bound text analysis tasks, texts must be annotated with their or their authors' location. Many texts are locatable through explicit labels but most have no explicit annotation of place. This paper describes a series of experiments to determine how positionally annotated microblog posts can be used to learn location-indicating words which then can be used to locate blog texts and their authors. A Gaussian distribution is used to model the locational qualities of words. We introduce the notion of placeness to describe how locational words are. We find that modelling word distributions to account for several locations and thus several Gaussian distributions per word, defining a filter which picks out words with high placeness based on their local distributional context, and aggregating locational information in a centroid for each text gives the most useful results. The results are applied to data in the Swedish language.

READ FULL TEXT

page 4

page 5

research
04/17/2021

Customized determination of stop words using Random Matrix Theory approach

The distances between words calculated in word units are studied and com...
research
01/06/2018

Explorations in an English Poetry Corpus: A Neurocognitive Poetics Perspective

This paper describes a corpus of about 3000 English literary texts with ...
research
07/09/2020

DISCO PAL: Diachronic Spanish Sonnet Corpus with Psychological and Affective Labels

Nowadays, there are many applications of text mining over corpus from di...
research
11/14/2016

Lost in Space: Geolocation in Event Data

Extracting the "correct" location information from text data, i.e., dete...
research
05/11/2023

Towards a Computational Analysis of Suspense: Detecting Dangerous Situations

Suspense is an important tool in storytelling to keep readers engaged an...
research
07/30/2015

Information-theoretical analysis of the statistical dependencies among three variables: Applications to written language

We develop the information-theoretical concepts required to study the st...
research
01/30/2023

Using n-aksaras to model Sanskrit and Sanskrit-adjacent texts

Despite – or perhaps because of – their simplicity, n-grams, or contiguo...

Please sign up or login with your details

Forgot password? Click here to reset