Finding Variants for Construction-Based Dialectometry: A Corpus-Based Approach to Regional CxGs

04/03/2021
by   Jonathan Dunn, et al.
0

This paper develops a construction-based dialectometry capable of identifying previously unknown constructions and measuring the degree to which a given construction is subject to regional variation. The central idea is to learn a grammar of constructions (a CxG) using construction grammar induction and then to use these constructions as features for dialectometry. This offers a method for measuring the aggregate similarity between regional CxGs without limiting in advance the set of constructions subject to variation. The learned CxG is evaluated on how well it describes held-out test corpora while dialectometry is evaluated on how well it can model regional varieties of English. Themethod is tested using two distinct datasets: First, the International Corpus of English representing eight outer circle varieties; Second, a web-crawled corpus representing five inner circle varieties. Results show that themethod (1) produces a grammar with stable quality across sub-sets of a single corpus that is (2) capable of distinguishing between regional varieties of Englishwith a high degree of accuracy, thus (3) supporting dialectometricmethods formeasuring the similarity between varieties of English and (4) measuring the degree to which each construction is subject to regional variation. This is important for cognitive sociolinguistics because it operationalizes the idea that competition between constructions is organized at the functional level so that dialectometry needs to represent as much of the available functional space as possible.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 26

page 27

page 28

page 33

page 34

04/03/2021

Global Syntactic Variation in Seven Languages: Towards a Computational Dialectology

The goal of this paper is to provide a complete representation of region...
10/12/2021

Learned Construction Grammars Converge Across Registers Given Increased Exposure

This paper measures the impact of increased exposure on whether learned ...
04/11/2019

Modeling the Complexity and Descriptive Adequacy of Construction Grammars

This paper uses the Minimum Description Length paradigm to model the com...
09/06/2018

Evaluating Syntactic Properties of Seq2seq Output with a Broad Coverage HPSG: A Case Study on Machine Translation

Sequence to sequence (seq2seq) models are often employed in settings whe...
04/11/2019

Modeling Global Syntactic Variation in English Using Dialect Classification

This paper evaluates global-scale dialect identification for 14 national...
11/16/2015

Learning about Spanish dialects through Twitter

This paper maps the large-scale variation of the Spanish language by emp...
04/19/2021

Production vs Perception: The Role of Individuality in Usage-Based Grammar Induction

This paper asks whether a distinction between production-based and perce...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.