Text authorship identified using the dynamics of word co-occurrence networks

07/29/2016
by   Camilo Akimushkin, et al.
0

The identification of authorship in disputed documents still requires human expertise, which is now unfeasible for many tasks owing to the large volumes of text and authors in practical applications. In this study, we introduce a methodology based on the dynamics of word co-occurrence networks representing written texts to classify a corpus of 80 texts by 8 authors. The texts were divided into sections with equal number of linguistic tokens, from which time series were created for 12 topological metrics. The series were proven to be stationary (p-value>0.05), which permits to use distribution moments as learning attributes. With an optimized supervised learning procedure using a Radial Basis Function Network, 68 out of 80 texts were correctly classified, i.e. a remarkable 85 purely dynamic network metrics were found to characterize authorship, thus opening the way for the description of texts in terms of small evolving networks. Moreover, the approach introduced allows for comparison of texts with diverse characteristics in a simple, fast fashion.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/30/2016

Representation of texts as complex networks: a mesoscopic approach

Statistical techniques that analyze texts, referred to as text analytics...
research
03/13/2020

Using word embeddings to improve the discriminability of co-occurrence text networks

Word co-occurrence networks have been employed to analyze texts both in ...
research
06/22/2018

Paragraph-based complex networks: application to document classification and authenticity verification

With the increasing number of texts made available on the Internet, many...
research
05/01/2017

Labelled network subgraphs reveal stylistic subtleties in written texts

The vast amount of data and increase of computational capacity have allo...
research
08/17/2023

Real-Time Construction Algorithm of Co-Occurrence Network Based on Inverted Index

Co-occurrence networks are an important method in the field of natural l...
research
03/03/2023

Who could be behind QAnon? Authorship attribution with supervised machine-learning

A series of social media posts signed under the pseudonym "Q", started a...
research
06/28/2018

Predicting CEFRL levels in learner English on the basis of metrics and full texts

This paper analyses the contribution of language metrics and, potentiall...

Please sign up or login with your details

Forgot password? Click here to reset