DeepAI AI Chat
Log In Sign Up

Crowdsourcing Dialect Characterization through Twitter

07/26/2014
by   Bruno Gonçalves, et al.
0

We perform a large-scale analysis of language diatopic variation using geotagged microblogging datasets. By collecting all Twitter messages written in Spanish over more than two years, we build a corpus from which a carefully selected list of concepts allows us to characterize Spanish varieties on a global scale. A cluster analysis proves the existence of well defined macroregions sharing common lexical properties. Remarkably enough, we find that Spanish language is split into two superdialects, namely, an urban speech used across major American and Spanish citites and a diverse form that encompasses rural areas and small towns. The latter can be further clustered into smaller varieties with a stronger regional character.

READ FULL TEXT

page 1

page 2

page 3

page 4

11/16/2015

Learning about Spanish dialects through Twitter

This paper maps the large-scale variation of the Spanish language by emp...
02/22/2017

Dialectometric analysis of language variation in Twitter

In the last few years, microblogging platforms such as Twitter have give...
10/12/2021

A large scale lexical and semantic analysis of Spanish language variations in Twitter

Dialectometry is a discipline devoted to studying the variations of a la...
06/29/2020

Is Japanese gendered language used on Twitter ? A large scale study

This study analyzes the usage of Japanese gendered language on Twitter. ...
05/15/2020

Analyzing Temporal Relationships between Trending Terms on Twitter and Urban Dictionary Activity

As an online, crowd-sourced, open English-language slang dictionary, the...
12/19/2019

Developing a Multi-Platform Speech Recording System Toward Open Service of Building Large-Scale Speech Corpora

This paper briefly reports our ongoing attempt at the development of a m...
08/01/2021

Geolocation differences of language use in urban areas

The explosion in the availability of natural language data in the era of...