Authorship Attribution Based on Life-Like Network Automata

10/20/2016
by   Jeaneth Machicao, et al.
0

The authorship attribution is a problem of considerable practical and technical interest. Several methods have been designed to infer the authorship of disputed documents in multiple contexts. While traditional statistical methods based solely on word counts and related measurements have provided a simple, yet effective solution in particular cases; they are prone to manipulation. Recently, texts have been successfully modeled as networks, where words are represented by nodes linked according to textual similarity measurements. Such models are useful to identify informative topological patterns for the authorship recognition task. However, there is no consensus on which measurements should be used. Thus, we proposed a novel method to characterize text networks, by considering both topological and dynamical aspects of networks. Using concepts and methods from cellular automata theory, we devised a strategy to grasp informative spatio-temporal patterns from this model. Our experiments revealed an outperformance over traditional analysis relying only on topological measurements. Remarkably, we have found a dependence of pre-processing steps (such as the lemmatization) on the obtained results, a feature that has mostly been disregarded in related works. The optimized results obtained here pave the way for a better characterization of textual networks.

READ FULL TEXT

page 13

page 22

research
04/09/2015

Concentric network symmetry grasps authors' styles in word adjacency networks

Several characteristics of written texts have been inferred from statist...
research
02/04/2015

Authorship recognition via fluctuation analysis of network topology and word intermittency

Statistical methods have been widely employed in many practical natural ...
research
12/29/2014

Probing the topological properties of complex networks modeling short written texts

In recent years, graph theory has been widely employed to probe several ...
research
08/05/2017

Extractive Multi Document Summarization using Dynamical Measurements of Complex Networks

Due to the large amount of textual information available on Internet, it...
research
07/28/2015

Classifying informative and imaginative prose using complex networks

Statistical methods have been widely employed in recent years to grasp m...
research
09/17/2015

Network analysis of named entity co-occurrences in written texts

The use of methods borrowed from statistics and physics to analyze writt...
research
06/30/2015

A complex network approach to stylometry

Statistical methods have been widely employed to study the fundamental p...

Please sign up or login with your details

Forgot password? Click here to reset