Visualization of Big Spatial Data using Coresets for Kernel Density Estimates

09/13/2017
by   Yan Zheng, et al.
0

The size of large, geo-located datasets has reached scales where visualization of all data points is inefficient. Random sampling is a method to reduce the size of a dataset, yet it can introduce unwanted errors. We describe a method for subsampling of spatial data suitable for creating kernel density estimates from very large data and demonstrate that it results in less error than random sampling. We also introduce a method to ensure that thresholding of low values based on sampled data does not omit any regions above the desired threshold when working with sampled data. We demonstrate the effectiveness of our approach using both, artificial and real-world large geospatial datasets.

READ FULL TEXT

page 4

page 7

research
09/09/2023

Correcting sampling biases via importancereweighting for spatial modeling

In machine learning models, the estimation of errors is often complex du...
research
01/29/2023

Neural Relation Graph for Identifying Problematic Data

Diagnosing and cleaning datasets are crucial for building robust machine...
research
05/09/2017

Spatial Random Sampling: A Structure-Preserving Data Sketching Tool

Random column sampling is not guaranteed to yield data sketches that pre...
research
04/16/2017

Random Walk Sampling for Big Data over Networks

It has been shown recently that graph signals with small total variation...
research
12/24/2018

bigMap: Big Data Mapping with Parallelized t-SNE

We introduce an improved unsupervised clustering protocol specially suit...
research
05/08/2018

Parallel Computation of PDFs on Big Spatial Data Using Spark

We consider big spatial data, which is typically produced in scientific ...
research
05/11/2021

BikNN: Anomaly Estimation in Bilateral Domains with k-Nearest Neighbors

In this paper, a novel framework for anomaly estimation is proposed. The...

Please sign up or login with your details

Forgot password? Click here to reset