Measuring Linguistic Diversity During COVID-19

04/03/2021
by   Jonathan Dunn, et al.
0

Computational measures of linguistic diversity help us understand the linguistic landscape using digital language data. The contribution of this paper is to calibrate measures of linguistic diversity using restrictions on international travel resulting from the COVID-19 pandemic. Previous work has mapped the distribution of languages using geo-referenced social media and web data. The goal, however, has been to describe these corpora themselves rather than to make inferences about underlying populations. This paper shows that a difference-in-differences method based on the Herfindahl-Hirschman Index can identify the bias in digital corpora that is introduced by non-local populations. These methods tell us where significant changes have taken place and whether this leads to increased or decreased diversity. This is an important step in aligning digital corpora like social media with the real-world populations that have produced them.

READ FULL TEXT

page 2

page 5

page 6

page 7

research
08/21/2023

Comparing Measures of Linguistic Diversity Across Social Media Language Data and Census Data at Subnational Geographic Areas

This paper describes a preliminary study on the comparative linguistic e...
research
04/02/2020

Mapping Languages and Demographics with Georeferenced Corpora

This paper evaluates large georeferenced corpora, taken from both web-cr...
research
06/09/2022

Corpus Similarity Measures Remain Robust Across Diverse Languages

This paper experiments with frequency-based corpus similarity measures a...
research
04/03/2021

Representations of Language Varieties Are Reliable Given Corpus Similarity Measures

This paper measures similarity both within and between 84 language varie...
research
06/01/2020

Independent Component Analysis for Trustworthy Cyberspace during High Impact Events: An Application to Covid-19

Social media has become an important communication channel during high i...
research
07/25/2023

Diversity and Language Technology: How Techno-Linguistic Bias Can Cause Epistemic Injustice

It is well known that AI-based language technology – large language mode...
research
09/20/2022

Register Variation Remains Stable Across 60 Languages

This paper measures the stability of cross-linguistic register variation...

Please sign up or login with your details

Forgot password? Click here to reset