Scalable GWR: A linear-time algorithm for large-scale geographically weighted regression with polynomial kernels

05/01/2019
by   Daisuke Murakami, et al.
0

While a number of studies have developed fast geographically weighted regression (GWR) algorithms for large samples, none of them achieves the linear-time estimation that is considered requisite for big data analysis in machine learning, geostatistics, and related domains. Against this backdrop, this study proposes a scalable GWR (ScaGWR) for large datasets. The key development is the calibration of the model through a pre-compression of the matrices and vectors whose size depends on the sample size, prior to the execution of leave-one-out cross-validation (LOOCV) that is the heaviest computational step in conventional GWR. This pre-compression allows us to run the proposed GWR extension such that its computation time increases linearly with sample size, whereas conventional GWR algorithms take at most quad-quadratic-order time. With this development, the ScaGWR can be calibrated with more than one million samples without parallelization. Moreover, the ScaGWR estimator can be regarded as an empirical Bayesian estimator that is more stable than the conventional GWR estimator. This study compared the ScaGWR with the conventional GWR in terms of estimation accuracy, predictive accuracy, and computational efficiency using a Monte Carlo simulation. Then, we apply these methods to a residential land analysis in the Tokyo Metropolitan Area. The code for ScaGWR is available in the R package scgwr, and is going to be incorporated into another R package, GWmodel.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/26/2019

A memory-free spatial additive mixed modeling for big spatial data

This study develops a spatial additive mixed modeling (AMM) approach est...
research
05/11/2021

Sketching in Bayesian High Dimensional Regression With Big Data Using Gaussian Scale Mixture Priors

Bayesian computation of high dimensional linear regression models with a...
research
04/29/2018

A linear time algorithm for multiscale quantile simulation

Change-point problems have appeared in a great many applications for exa...
research
05/08/2020

Compressing Large Sample Data for Discriminant Analysis

Large-sample data became prevalent as data acquisition became cheaper an...
research
07/17/2018

Spatially varying coefficient modeling for large datasets: Eliminating N from spatial regressions

While spatially varying coefficient (SVC) modeling is popular in applied...
research
01/12/2023

confidence-planner: Easy-to-Use Prediction Confidence Estimation and Sample Size Planning

Machine learning applications, especially in the fields of me­di­cine an...
research
08/25/2023

SGMM: Stochastic Approximation to Generalized Method of Moments

We introduce a new class of algorithms, Stochastic Generalized Method of...

Please sign up or login with your details

Forgot password? Click here to reset