Labelled network subgraphs reveal stylistic subtleties in written texts

05/01/2017
by   Vanessa Q. Marinho, et al.
0

The vast amount of data and increase of computational capacity have allowed the analysis of texts from several perspectives, including the representation of texts as complex networks. Nodes of the network represent the words, and edges represent some relationship, usually word co-occurrence. Even though networked representations have been applied to study some tasks, such approaches are not usually combined with traditional models relying upon statistical paradigms. Because networked models are able to grasp textual patterns, we devised a hybrid classifier, called labelled subgraphs, that combines the frequency of common words with small structures found in the topology of the network, known as motifs. Our approach is illustrated in two contexts, authorship attribution and translationese identification. In the former, a set of novels written by different authors is analyzed. To identify translationese, texts from the Canadian Hansard and the European parliament were classified as to original and translated instances. Our results suggest that labelled subgraphs are able to represent texts and it should be further explored in other tasks, such as the analysis of text complexity, language proficiency, and machine translation.

READ FULL TEXT

page 4

page 5

research
06/30/2016

Representation of texts as complex networks: a mesoscopic approach

Statistical techniques that analyze texts, referred to as text analytics...
research
02/04/2015

Authorship recognition via fluctuation analysis of network topology and word intermittency

Statistical methods have been widely employed in many practical natural ...
research
01/14/2021

Estimation of the Frequency of Occurrence of Italian Phonemes in Text

The purpose of this project was to derive a reliable estimate of the fre...
research
05/11/2017

On the role of words in the network structure of texts: application to authorship attribution

Well-established automatic analyses of texts mainly consider frequencies...
research
05/29/2017

On the "Calligraphy" of Books

Authorship attribution is a natural language processing task that has be...
research
07/29/2016

Text authorship identified using the dynamics of word co-occurrence networks

The identification of authorship in disputed documents still requires hu...
research
02/04/2016

Complex Networks of Words in Fables

In this chapter we give an overview of the application of complex networ...

Please sign up or login with your details

Forgot password? Click here to reset