Characterizing English Variation across Social Media Communities with BERT

02/12/2021
by   Li Lucy, et al.
0

Much previous work characterizing language variation across Internet social groups has focused on the types of words used by these groups. We extend this type of study by employing BERT to characterize variation in the senses of words as well, analyzing two months of English comments in 474 Reddit communities. The specificity of different sense clusters to a community, combined with the specificity of a community's unique word types, is used to identify cases where a social group's language deviates from the norm. We validate our metrics using user-created glossaries and draw on sociolinguistic theories to connect language variation with trends in community behavior. We find that communities with highly distinctive language are medium-sized, and their loyal and highly engaged users interact in dense networks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/15/2018

Semantic Variation in Online Communities of Practice

We introduce a framework for quantifying semantic variation of common wo...
research
08/08/2022

Characterizing Social Movement Narratives in Online Communities: The 2021 Cuban Protests on Reddit

Social movements are dominated by storytelling, as narratives play a key...
research
06/30/2021

When the Echo Chamber Shatters: Examining the Use of Community-Specific Language Post-Subreddit Ban

Community-level bans are a common tool against groups that enable online...
research
12/04/2017

#anorexia, #anarexia, #anarexyia: Characterizing Online Community Practices with Orthographic Variation

Distinctive linguistic practices help communities build solidarity and d...
research
10/23/2020

HateBERT: Retraining BERT for Abusive Language Detection in English

In this paper, we introduce HateBERT, a re-trained BERT model for abusiv...
research
02/19/2019

Using Crowdsourcing to Identify a Proxy of Socio-Economic status

Social Media provides researchers with an unprecedented opportunity to g...
research
07/10/2019

Exploiting user-frequency information for mining regionalisms from Social Media texts

The task of detecting regionalisms (expressions or words used in certain...

Please sign up or login with your details

Forgot password? Click here to reset