Global Syntactic Variation in Seven Languages: Towards a Computational Dialectology

04/03/2021
by   Jonathan Dunn, et al.
0

The goal of this paper is to provide a complete representation of regional linguistic variation on a global scale. To this end, the paper focuses on removing three constraints that have previously limited work within dialectology/dialectometry. First, rather than assuming a fixed and incomplete set of variants, we use Computational Construction Grammar to provide a replicable and falsifiable set of syntactic features. Second, rather than assuming a specific area of interest, we use global language mapping based on web-crawled and social media datasets to determine the selection of national varieties. Third, rather than looking at a single language in isolation, we model seven major languages together using the same methods: Arabic, English, French, German, Portuguese, Russian, and Spanish. Results show that models for each language are able to robustly predict the region-of-origin of held-out samples better using Construction Grammars than using simpler syntactic features. These global-scale experiments are used to argue that new methods in computational sociolinguistics are able to provide more generalized models of regional variation that are essential for understanding language variation and change at scale.

READ FULL TEXT

page 1

page 5

page 11

page 17

page 20

page 22

research
04/11/2019

Modeling Global Syntactic Variation in English Using Dialect Classification

This paper evaluates global-scale dialect identification for 14 national...
research
04/03/2021

Finding Variants for Construction-Based Dialectometry: A Corpus-Based Approach to Regional CxGs

This paper develops a construction-based dialectometry capable of identi...
research
10/21/2022

Graphemic Normalization of the Perso-Arabic Script

Since its original appearance in 1991, the Perso-Arabic script represent...
research
01/11/2016

The Effects of Age, Gender and Region on Non-standard Linguistic Variation in Online Social Networks

We present a corpus-based analysis of the effects of age, gender and reg...
research
01/08/2019

Computational Register Analysis and Synthesis

The study of register in computational language research has historicall...
research
11/16/2020

A Probabilistic Approach in Historical Linguistics Word Order Change in Infinitival Clauses: from Latin to Old French

This research offers a new interdisciplinary approach to the field of Li...
research
02/17/2023

False perspectives on human language: why statistics needs linguistics

A sharp tension exists about the nature of human language between two op...

Please sign up or login with your details

Forgot password? Click here to reset