A joint text mining-rank size investigation of the rhetoric structures of the US Presidents' speeches

by   Valerio Ficcadenti, et al.

This work presents a text mining context and its use for a deep analysis of the messages delivered by the politicians. Specifically, we deal with an expert systems-based exploration of the rhetoric dynamics of a large collection of US Presidents' speeches, ranging from Washington to Trump. In particular, speeches are viewed as complex expert systems whose structures can be effectively analyzed through rank-size laws. The methodological contribution of the paper is twofold. First, we develop a text mining-based procedure for the construction of the dataset by using a web scraping routine on the Miller Center website -- the repository collecting the speeches. Second, we explore the implicit structure of the discourse data by implementing a rank-size procedure over the individual speeches, being the words of each speech ranked in terms of their frequencies. The scientific significance of the proposed combination of text-mining and rank-size approaches can be found in its flexibility and generality, which let it be reproducible to a wide set of expert systems and text mining contexts. The usefulness of the proposed method and the speech subsequent analysis is demonstrated by the findings themselves. Indeed, in terms of impact, it is worth noting that interesting conclusions of social, political and linguistic nature on how 45 United States Presidents, from April 30, 1789 till February 28, 2017 delivered political messages can be carried out. Indeed, the proposed analysis shows some remarkable regularities, not only inside a given speech, but also among different speeches. Moreover, under a purely methodological perspective, the presented contribution suggests possible ways of generating a linguistic decision-making algorithm.


Words ranking and Hirsch index for identifying the core of the hapaxes in political texts

This paper deals with a quantitative analysis of the content of official...

Pbm: A new dataset for blog mining

Text mining is becoming vital as Web 2.0 offers collaborative content cr...

Evolving linguistic divergence on polarizing social media

Language change is influenced by many factors, but often starts from syn...

Markov Chain Monte Carlo for generating ranked textual data

This paper faces a central theme in applied statistics and information s...

A Statistical Model of Word Rank Evolution

The availability of large linguistic data sets enables data-driven appro...

ChatGPT-4 Outperforms Experts and Crowd Workers in Annotating Political Twitter Messages with Zero-Shot Learning

This paper assesses the accuracy, reliability and bias of the Large Lang...

Political Text Scaling Meets Computational Semantics

During the last fifteen years, text scaling approaches have become a cen...

Please sign up or login with your details

Forgot password? Click here to reset