A Kernel Independence Test for Geographical Language Variation

01/25/2016
by   Dong Nguyen, et al.
0

Quantifying the degree of spatial dependence for linguistic variables is a key task for analyzing dialectal variation. However, existing approaches have important drawbacks. First, they are based on parametric models of dependence, which limits their power in cases where the underlying parametric assumptions are violated. Second, they are not applicable to all types of linguistic data: some approaches apply only to frequencies, others to boolean indicators of whether a linguistic variable is present. We present a new method for measuring geographical language variation, which solves both of these problems. Our approach builds on Reproducing Kernel Hilbert space (RKHS) representations for nonparametric statistics, and takes the form of a test statistic that is computed from pairs of individual geotagged observations without aggregation into predefined geographical bins. We compare this test with prior work using synthetic data as well as a diverse set of real datasets: a corpus of Dutch tweets, a Dutch syntactic atlas, and a dataset of letters to the editor in North American newspapers. Our proposed test is shown to support robust inferences across a broad range of scenarios and types of data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/30/2022

Non-Parametric Inference of Relational Dependence

Independence testing plays a central role in statistical and causal infe...
research
02/18/2014

A Kernel Independence Test for Random Processes

A new non parametric approach to the problem of testing the independence...
research
03/01/2016

Kernel-based Tests for Joint Independence

We investigate the problem of testing whether d random variables, which ...
research
12/08/2019

A kernel log-rank test of independence for right-censored data

With the incorporation of new data gathering methods in clinical researc...
research
02/03/2020

Phylogenetic signal in phonotactics

Phylogenetic methods have broad potential in linguistics beyond tree inf...
research
11/17/2020

A kernel test for quasi-independence

We consider settings in which the data of interest correspond to pairs o...
research
06/10/2019

Nonparametric Independence Testing for Right-Censored Data using Optimal Transport

We propose a nonparametric test of independence, termed OPT-HSIC, betwee...

Please sign up or login with your details

Forgot password? Click here to reset