WMDecompose: A Framework for Leveraging the Interpretable Properties of Word Mover's Distance in Sociocultural Analysis

10/14/2021
by   Mikael Brunila, et al.
0

Despite the increasing popularity of NLP in the humanities and social sciences, advances in model performance and complexity have been accompanied by concerns about interpretability and explanatory power for sociocultural analysis. One popular model that balances complexity and legibility is Word Mover's Distance (WMD). Ostensibly adapted for its interpretability, WMD has nonetheless been used and further developed in ways which frequently discard its most interpretable aspect: namely, the word-level distances required for translating a set of words into another set of words. To address this apparent gap, we introduce WMDecompose: a model and Python library that 1) decomposes document-level distances into their constituent word-level distances, and 2) subsequently clusters words to induce thematic elements, such that useful lexical information is retained and summarized for analysis. To illustrate its potential in a social scientific context, we apply it to a longitudinal social media corpus to explore the interrelationship between conspiracy theories and conservative American discourses. Finally, because of the full WMD model's high time-complexity, we additionally suggest a method of sampling document pairs from large datasets in a reproducible way, with tight bounds that prevent extrapolation of unreliable results due to poor sampling practices.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/20/2017

Mixed Membership Word Embeddings for Computational Social Science

Word embeddings improve the performance of NLP systems by revealing the ...
research
12/01/2019

Speeding up Word Mover's Distance and its variants via properties of distances between embeddings

The Word Mover's Distance (WMD) proposed in Kusner et al. [ICML,2015] is...
research
04/05/2020

Domain-based Latent Personal Analysis and its use for impersonation detection in social media

Zipf's law defines an inverse proportion between a word's ranking in a g...
research
01/08/2021

Graph-of-Tweets: A Graph Merging Approach to Sub-event Identification

Graph structures are powerful tools for modeling the relationships betwe...
research
12/02/2019

Learning Word Ratings for Empathy and Distress from Document-Level User Responses

Despite the excellent performance of black box approaches to modeling se...
research
02/03/2023

Improving Interpretability via Explicit Word Interaction Graph Layer

Recent NLP literature has seen growing interest in improving model inter...

Please sign up or login with your details

Forgot password? Click here to reset