Topic segmentation via community detection in complex networks

12/04/2015
by   Henrique F. de Arruda, et al.
0

Many real systems have been modelled in terms of network concepts, and written texts are a particular example of information networks. In recent years, the use of network methods to analyze language has allowed the discovery of several interesting findings, including the proposition of novel models to explain the emergence of fundamental universal patterns. While syntactical networks, one of the most prevalent networked models of written texts, display both scale-free and small-world properties, such representation fails in capturing other textual features, such as the organization in topics or subjects. In this context, we propose a novel network representation whose main purpose is to capture the semantical relationships of words in a simple way. To do so, we link all words co-occurring in the same semantic context, which is defined in a threefold way. We show that the proposed representations favours the emergence of communities of semantically related words, and this feature may be used to identify relevant topics. The proposed methodology to detect topics was applied to segment selected Wikipedia articles. We have found that, in general, our methods outperform traditional bag-of-words representations, which suggests that a high-level textual representation may be useful to study semantical features of texts.

READ FULL TEXT

page 4

page 8

page 9

research
06/22/2018

Paragraph-based complex networks: application to document classification and authenticity verification

With the increasing number of texts made available on the Internet, many...
research
02/04/2015

Authorship recognition via fluctuation analysis of network topology and word intermittency

Statistical methods have been widely employed in many practical natural ...
research
02/04/2020

From Topic Networks to Distributed Cognitive Maps: Zipfian Topic Universes in the Area of Volunteered Geographic Information

Are nearby places (e.g. cities) described by related words? In this arti...
research
02/04/2016

Complex Networks of Words in Fables

In this chapter we give an overview of the application of complex networ...
research
06/25/2016

Word sense disambiguation via bipartite representation of complex networks

In recent years, concepts and methods of complex networks have been empl...
research
09/17/2015

Network analysis of named entity co-occurrences in written texts

The use of methods borrowed from statistics and physics to analyze writt...
research
11/05/2021

Monitoring geometrical properties of word embeddings for detecting the emergence of new topics

Slow emerging topic detection is a task between event detection, where w...

Please sign up or login with your details

Forgot password? Click here to reset