A Semi-Automated Approach for Information Extraction, Classification and Analysis of Unstructured Data

10/20/2019
by   Alberto Purpura, et al.
0

In this paper, we show how Quantitative Narrative Analysis and simple Natural Language Processing techniques apply to the extraction and categorization of data in a sample case study of the Diary of the former President of the Italian Republic (PoR), Giorgio Napolitano. The Diary contains a record of all his institutional meetings. This information, if properly handled, allows for an analysis of how the PoR used his so-called soft-powers to influence the Italian political system during his first mandate. In this paper, we propose a way to use simple, yet very effective, Natural Language Processing techniques - such as Regular Expressions and Named Entity Recognition - to extract information from the Diary. Then, we propose an innovative way to organize the extracted data relying on the methodological framework of Quantitative Narrative Analysis. Finally, we show how to analyze the structured data under different levels of detail using PC-ACE (Program for Computer-Assisted Coding of Events), a software developed specifically for this task and for data visualization.

READ FULL TEXT
research
03/27/2017

A Tidy Data Model for Natural Language Processing using cleanNLP

The package cleanNLP provides a set of fast tools for converting a textu...
research
02/27/2021

Automated Generation of Interorganizational Disaster Response Networks through Information Extraction

When a disaster occurs, maintaining and restoring community lifelines su...
research
10/22/2020

An Analysis of Simple Data Augmentation for Named Entity Recognition

Simple yet effective data augmentation techniques have been proposed for...
research
11/22/2022

Smart Agriculture : A Novel Multilevel Approach for Agricultural Risk Assessment over Unstructured Data

Detecting opportunities and threats from massive text data is a challeng...
research
07/11/2022

Learning Mutual Fund Categorization using Natural Language Processing

Categorization of mutual funds or Exchange-Traded-funds (ETFs) have long...
research
10/02/2012

A Semantic Approach for Automatic Structuring and Analysis of Software Process Patterns

The main contribution of this paper, is to propose a novel semantic appr...

Please sign up or login with your details

Forgot password? Click here to reset