On the role of words in the network structure of texts: application to authorship attribution

05/11/2017
by   Camilo Akimushkin, et al.
0

Well-established automatic analyses of texts mainly consider frequencies of linguistic units, e.g. letters, words and bigrams, while methods based on co-occurrence networks consider the structure of texts regardless of the nodes label (i.e. the words semantics). In this paper, we reconcile these distinct viewpoints by introducing a generalized similarity measure to compare texts which accounts for both the network structure of texts and the role of individual words in the networks. We use the similarity measure for authorship attribution of three collections of books, each composed of 8 authors and 10 books per author. High accuracy rates were obtained with typical values from 90 the same collections. These accuracies are also higher than taking only the topology of networks into account. We conclude that the different properties of specific words on the macroscopic scale structure of a whole text are as relevant as their frequency of appearance; conversely, considering the identity of nodes brings further knowledge about a piece of text represented as a network.

READ FULL TEXT
research
07/23/2016

Authorship attribution via network motifs identification

Concepts and methods of complex networks can be used to analyse texts at...
research
08/16/2018

Linguistic data mining with complex networks: a stylometric-oriented approach

By representing a text by a set of words and their co-occurrences, one o...
research
05/01/2017

Labelled network subgraphs reveal stylistic subtleties in written texts

The vast amount of data and increase of computational capacity have allo...
research
11/11/2016

Generalized Entropies and the Similarity of Texts

We show how generalized Gibbs-Shannon entropies can provide new insights...
research
11/09/2022

A comparison of several AI techniques for authorship attribution on Romanian texts

Determining the author of a text is a difficult task. Here we compare mu...
research
05/04/2022

Using virtual edges to extract keywords from texts modeled as complex networks

Detecting keywords in texts is important for many text mining applicatio...
research
05/29/2017

On the "Calligraphy" of Books

Authorship attribution is a natural language processing task that has be...

Please sign up or login with your details

Forgot password? Click here to reset