A Semi-Automated Approach for Information Extraction, Classification and Analysis of Unstructured Data

10/20/2019
by   Alberto Purpura, et al.
0

In this paper, we show how Quantitative Narrative Analysis and simple Natural Language Processing techniques apply to the extraction and categorization of data in a sample case study of the Diary of the former President of the Italian Republic (PoR), Giorgio Napolitano. The Diary contains a record of all his institutional meetings. This information, if properly handled, allows for an analysis of how the PoR used his so-called soft-powers to influence the Italian political system during his first mandate. In this paper, we propose a way to use simple, yet very effective, Natural Language Processing techniques - such as Regular Expressions and Named Entity Recognition - to extract information from the Diary. Then, we propose an innovative way to organize the extracted data relying on the methodological framework of Quantitative Narrative Analysis. Finally, we show how to analyze the structured data under different levels of detail using PC-ACE (Program for Computer-Assisted Coding of Events), a software developed specifically for this task and for data visualization.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset