Dialectograms: Machine Learning Differences between Discursive Communities

02/11/2023
by   Thyge Enggaard, et al.
0

Word embeddings provide an unsupervised way to understand differences in word usage between discursive communities. A number of recent papers have focused on identifying words that are used differently by two or more communities. But word embeddings are complex, high-dimensional spaces and a focus on identifying differences only captures a fraction of their richness. Here, we take a step towards leveraging the richness of the full embedding space, by using word embeddings to map out how words are used differently. Specifically, we describe the construction of dialectograms, an unsupervised way to visually explore the characteristic ways in which each community use a focal word. Based on these dialectograms, we provide a new measure of the degree to which words are used differently that overcomes the tendency for existing measures to pick out low frequent or polysemous words. We apply our methods to explore the discourses of two US political subreddits and show how our methods identify stark affective polarisation of politicians and political entities, differences in the assessment of proper political action as well as disagreement about whether certain issues require political intervention at all.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/22/2019

Learning dynamic word embeddings with drift regularisation

Word usage, meaning and connotation change throughout time. Diachronic w...
research
09/05/2017

Using k-way Co-occurrences for Learning Word Embeddings

Co-occurrences between two words provide useful insights into the semant...
research
08/22/2019

ViCo: Word Embeddings from Visual Co-occurrences

We propose to learn word embeddings from visual co-occurrences. Two word...
research
10/15/2018

Poincaré GloVe: Hyperbolic Word Embeddings

Words are not created equal. In fact, they form an aristocratic graph wi...
research
08/28/2023

CommunityFish: A Poisson-based Document Scaling With Hierarchical Clustering

Document scaling has been a key component in text-as-data applications f...
research
10/21/2022

Discovering Differences in the Representation of People using Contextualized Semantic Axes

A common paradigm for identifying semantic differences across social and...
research
10/09/2022

Cross-strait Variations on Two Near-synonymous Loanwords xie2shang1 and tan2pan4: A Corpus-based Comparative Study

This study attempts to investigate cross-strait variations on two typica...

Please sign up or login with your details

Forgot password? Click here to reset