Text Mining for Processing Interview Data in Computational Social Science

11/28/2020
by   Jussi Karlgren, et al.
0

We use commercially available text analysis technology to process interview text data from a computational social science study. We find that topical clustering and terminological enrichment provide for convenient exploration and quantification of the responses. This makes it possible to generate and test hypotheses and to compare textual and non-textual variables, and saves analyst effort. We encourage studies in social science to use text analysis, especially for exploratory open-ended studies. We discuss how replicability requirements are met by text analysis technology. We note that the most recent learning models are not designed with transparency in mind, and that research requires a model to be editable and its decisions to be explainable. The tools available today, such as the one used in the present study, are not built for processing interview texts. While many of the variables under consideration are quantifiable using lexical statistics, we find that some interesting and potentially valuable features are difficult or impossible to automatise reliably at present. We note that there are some potentially interesting applications for traditional natural language processing mechanisms such as named entity recognition and anaphora resolution in this application area. We conclude with a suggestion for language technologists to investigate the challenge of processing interview data comprehensively, especially the interplay between question and response, and we encourage social science researchers not to hesitate to use text analysis tools, especially for the exploratory phase of processing interview data.?

READ FULL TEXT
research
03/27/2017

A Tidy Data Model for Natural Language Processing using cleanNLP

The package cleanNLP provides a set of fast tools for converting a textu...
research
04/19/2022

Councils in Action: Automating the Curation of Municipal Governance Data for Research

Large scale comparative research into municipal governance is often proh...
research
05/23/2022

A Natural Language Processing Pipeline for Detecting Informal Data References in Academic Literature

Discovering authoritative links between publications and the datasets th...
research
08/02/2019

DELTA: A DEep learning based Language Technology plAtform

In this paper we present DELTA, a deep learning based language technolog...
research
02/28/2018

Computational International Relations: What Can Programming, Coding and Internet Research Do for the Discipline?

Computational Social Science emerged as a highly technical and popular d...
research
08/17/2022

Transformer Encoder for Social Science

High-quality text data has become an important data source for social sc...
research
04/01/2023

What Does the Indian Parliament Discuss? An Exploratory Analysis of the Question Hour in the Lok Sabha

The TCPD-IPD dataset is a collection of questions and answers discussed ...

Please sign up or login with your details

Forgot password? Click here to reset