Classifying informative and imaginative prose using complex networks

07/28/2015
by   Henrique F. de Arruda, et al.
0

Statistical methods have been widely employed in recent years to grasp many language properties. The application of such techniques have allowed an improvement of several linguistic applications, which encompasses machine translation, automatic summarization and document classification. In the latter, many approaches have emphasized the semantical content of texts, as it is the case of bag-of-word language models. This approach has certainly yielded reasonable performance. However, some potential features such as the structural organization of texts have been used only on a few studies. In this context, we probe how features derived from textual structure analysis can be effectively employed in a classification task. More specifically, we performed a supervised classification aiming at discriminating informative from imaginative documents. Using a networked model that describes the local topological/dynamical properties of function words, we achieved an accuracy rate of up to 95 is much higher than similar networked approaches. A systematic analysis of feature relevance revealed that symmetry and accessibility measurements are among the most prominent network measurements. Our results suggest that these measurements could be used in related language applications, as they play a complementary role in characterizing texts.

READ FULL TEXT
research
12/29/2014

Probing the topological properties of complex networks modeling short written texts

In recent years, graph theory has been widely employed to probe several ...
research
02/04/2015

Authorship recognition via fluctuation analysis of network topology and word intermittency

Statistical methods have been widely employed in many practical natural ...
research
06/30/2015

A complex network approach to stylometry

Statistical methods have been widely employed to study the fundamental p...
research
08/05/2017

Extractive Multi Document Summarization using Dynamical Measurements of Complex Networks

Due to the large amount of textual information available on Internet, it...
research
10/05/2022

Using Full-Text Content to Characterize and Identify Best Seller Books

Artistic pieces can be studied from several perspectives, one example be...
research
04/09/2015

Concentric network symmetry grasps authors' styles in word adjacency networks

Several characteristics of written texts have been inferred from statist...
research
10/20/2016

Authorship Attribution Based on Life-Like Network Automata

The authorship attribution is a problem of considerable practical and te...

Please sign up or login with your details

Forgot password? Click here to reset