Network analysis of named entity co-occurrences in written texts

09/17/2015
by   Diego R. Amancio, et al.
0

The use of methods borrowed from statistics and physics to analyze written texts has allowed the discovery of unprecedent patterns of human behavior and cognition by establishing links between models features and language structure. While current models have been useful to unveil patterns via analysis of syntactical and semantical networks, only a few works have probed the relevance of investigating the structure arising from the relationship between relevant entities such as characters, locations and organizations. In this study, we represent entities appearing in the same context as a co-occurrence network, where links are established according to a null model based on random, shuffled texts. Computational simulations performed in novels revealed that the proposed model displays interesting topological features, such as the small world feature, characterized by high values of clustering coefficient. The effectiveness of our model was verified in a practical pattern recognition task in real networks. When compared with traditional word adjacency networks, our model displayed optimized results in identifying unknown references in texts. Because the proposed representation plays a complementary role in characterizing unstructured documents via topological analysis of named entities, we believe that it could be useful to improve the characterization of written texts (and related systems), specially if combined with traditional approaches based on statistical and deeper paradigms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/29/2014

Probing the topological properties of complex networks modeling short written texts

In recent years, graph theory has been widely employed to probe several ...
research
06/25/2016

Word sense disambiguation via bipartite representation of complex networks

In recent years, concepts and methods of complex networks have been empl...
research
02/04/2015

Authorship recognition via fluctuation analysis of network topology and word intermittency

Statistical methods have been widely employed in many practical natural ...
research
02/22/2016

Temporal Network Analysis of Literary Texts

We study temporal networks of characters in literature focusing on "Alic...
research
05/18/2019

Semantic flow in language networks

In this study we propose a framework to characterize documents based on ...
research
12/04/2015

Topic segmentation via community detection in complex networks

Many real systems have been modelled in terms of network concepts, and w...
research
10/20/2016

Authorship Attribution Based on Life-Like Network Automata

The authorship attribution is a problem of considerable practical and te...

Please sign up or login with your details

Forgot password? Click here to reset