Spatial Random Sampling: A Structure-Preserving Data Sketching Tool

05/09/2017
by   Mostafa Rahmani, et al.
0

Random column sampling is not guaranteed to yield data sketches that preserve the underlying structures of the data and may not sample sufficiently from less-populated data clusters. Also, adaptive sampling can often provide accurate low rank approximations, yet may fall short of producing descriptive data sketches, especially when the cluster centers are linearly dependent. Motivated by that, this paper introduces a novel randomized column sampling tool dubbed Spatial Random Sampling (SRS), in which data points are sampled based on their proximity to randomly sampled points on the unit sphere. The most compelling feature of SRS is that the corresponding probability of sampling from a given data cluster is proportional to the surface area the cluster occupies on the unit sphere, independently from the size of the cluster population. Although it is fully randomized, SRS is shown to provide descriptive and balanced data representations. The proposed idea addresses a pressing need in data science and holds potential to inspire many novel approaches for analysis of big data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/19/2020

The Benefits of Probability-Proportional-to-Size Sampling in Cluster-Randomized Experiments

In a cluster-randomized experiment, treatment is assigned to clusters of...
research
02/01/2015

High Dimensional Low Rank plus Sparse Matrix Decomposition

This paper is concerned with the problem of low rank plus sparse matrix ...
research
10/03/2017

Bayesian Inference under Cluster Sampling with Probability Proportional to Size

Cluster sampling is common in survey practice, and the corresponding inf...
research
09/13/2017

Visualization of Big Spatial Data using Coresets for Kernel Density Estimates

The size of large, geo-located datasets has reached scales where visuali...
research
11/18/2016

Robust and Scalable Column/Row Sampling from Corrupted Big Data

Conventional sampling techniques fall short of drawing descriptive sketc...
research
12/27/2018

Sampling on the sphere from f(x) ∝ x^TAx

A method for drawing random samples of unit vectors x in R^p with densit...
research
10/22/2020

Efficient design of geographically-defined clusters with spatial autocorrelation

Clusters form the basis of a number of research study designs including ...

Please sign up or login with your details

Forgot password? Click here to reset