Integrating topic modeling and word embedding to characterize violent deaths

06/28/2021
by   Alina Arseniev-Koehler, et al.
0

There is an escalating need for methods to identify latent patterns in text data from many domains. We introduce a new method to identify topics in a corpus and represent documents as topic sequences. Discourse Atom Topic Modeling draws on advances in theoretical machine learning to integrate topic modeling and word embedding, capitalizing on the distinct capabilities of each. We first identify a set of vectors ("discourse atoms") that provide a sparse representation of an embedding space. Atom vectors can be interpreted as latent topics: Through a generative model, atoms map onto distributions over words; one can also infer the topic that generated a sequence of words. We illustrate our method with a prominent example of underutilized text: the U.S. National Violent Death Reporting System (NVDRS). The NVDRS summarizes violent death incidents with structured variables and unstructured narratives. We identify 225 latent topics in the narratives (e.g., preparation for death and physical aggression); many of these topics are not captured by existing structured variables. Motivated by known patterns in suicide and homicide by gender, and recent research on gender biases in semantic space, we identify the gender bias of our topics (e.g., a topic about pain medication is feminine). We then compare the gender bias of topics to their prevalence in narratives of female versus male victims. Results provide a detailed quantitative picture of reporting about lethal violence and its gendered nature. Our method offers a flexible and broadly applicable approach to model topics in text data.

READ FULL TEXT

page 1

page 4

research
06/09/2016

Generative Topic Embedding: a Continuous Representation of Documents (Extended Version with Proofs)

Word embedding maps words into a low-dimensional continuous embedding sp...
research
03/03/2022

Representing Mixtures of Word Embeddings with Mixtures of Topic Embeddings

A topic model is often formulated as a generative model that explains ho...
research
11/24/2017

Semantic Map of Sexism: Topic Modelling of Everyday Sexism Project Entries

The Everyday Sexism Project documents everyday examples of sexism report...
research
11/24/2017

Continuous Semantic Topic Embedding Model Using Variational Autoencoder

This paper proposes the continuous semantic topic embedding model (CSTEM...
research
12/11/2019

Unwanted Advances in Higher Education: Uncovering Sexual Harassment Experiences in Academia with Text Mining

Sexual harassment in academia is often a hidden problem because victims ...
research
01/05/2017

Crime Topic Modeling

The classification of crime into discrete categories entails a massive l...
research
11/24/2020

Gender bias in magazines oriented to men and women: a computational approach

Cultural products are a source to acquire individual values and behaviou...

Please sign up or login with your details

Forgot password? Click here to reset