Scholastic: Graphical Human-Al Collaboration for Inductive and Interpretive Text Analysis

08/12/2022
by   Matt-Heun Hong, et al.
0

Interpretive scholars generate knowledge from text corpora by manually sampling documents, applying codes, and refining and collating codes into categories until meaningful themes emerge. Given a large corpus, machine learning could help scale this data sampling and analysis, but prior research shows that experts are generally concerned about algorithms potentially disrupting or driving interpretive scholarship. We take a human-centered design approach to addressing concerns around machine-assisted interpretive research to build Scholastic, which incorporates a machine-in-the-loop clustering algorithm to scaffold interpretive text analysis. As a scholar applies codes to documents and refines them, the resulting coding schema serves as structured metadata which constrains hierarchical document and word clusters inferred from the corpus. Interactive visualizations of these clusters can help scholars strategically sample documents further toward insights. Scholastic demonstrates how human-centered algorithm design and visualizations employing familiar metaphors can support inductive and interpretive research methodologies through interactive topic modeling and document clustering.

READ FULL TEXT
research
01/10/2020

Inductive Document Network Embedding with Topic-Word Attention

Document network embedding aims at learning representations for a struct...
research
11/15/2019

Assigning Medical Codes at the Encounter Level by Paying Attention to Documents

The vast majority of research in computer assisted medical coding focuse...
research
01/06/2021

User Ex Machina : Simulation as a Design Probe in Human-in-the-Loop Text Analytics

Topic models are widely used analysis techniques for clustering document...
research
09/19/2023

Interactive Distillation of Large Single-Topic Corpora of Scientific Papers

Highly specific datasets of scientific literature are important for both...
research
01/20/2023

Transforming Unstructured Text into Data with Context Rule Assisted Machine Learning (CRAML)

We describe a method and new no-code software tools enabling domain expe...
research
02/21/2016

Interactive Storytelling over Document Collections

Storytelling algorithms aim to 'connect the dots' between disparate docu...
research
12/15/2020

Efficient Clustering from Distributions over Topics

There are many scenarios where we may want to find pairs of textually si...

Please sign up or login with your details

Forgot password? Click here to reset