Adaptive Sampling for Rapidly Matching Histograms

08/20/2017
by   Stephen Macke, et al.
0

In exploratory data analysis, analysts often have a need to identify histograms that possess a specific distribution, among a large class of candidate histograms, e.g., find histograms of countries whose income distribution is most similar to that of Greece. This distribution could be a new one that the user is curious about, or a known distribution from an existing histogram visualization. At present, this process of identification is brute-force, requiring the manual generation and evaluation of a large number of histograms. We present FastMatch: an end-to-end architecture for interactively retrieving the histogram visualizations that are most similar to a user-specified target, from a large collection of histograms. The primary technical contribution underlying FastMatch is a sublinear algorithm, HistSim, a theoretically sound sampling-based approach to identify the top-k closest histograms under ℓ_1 distance. While HistSim can be used independently, within FastMatch we couple HistSim with a novel system architecture that is aware of practical considerations, employing block-based sampling policies and asynchronous statistics and computation, building on lightweight sampling engines developed in recent work. In our experiments on several real-world datasets, FastMatch obtains near-perfect accuracy with up to 100× speedups over less sophisticated approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/01/2022

Scalable MCMC Sampling for Nonsymmetric Determinantal Point Processes

A determinantal point process (DPP) is an elegant model that assigns a p...
research
09/23/2019

Perfect Sampling of graph k-colorings for k> 3Δ

We give an algorithm for perfect sampling from the uniform distribution ...
research
06/10/2019

Big Variates: Visualizing and identifying key variables in a multivariate world

Big Data involves both a large number of events but also many variables....
research
08/30/2019

Composite likelihood methods for histogram-valued random variables

Symbolic data analysis has been proposed as a technique for summarising ...
research
02/23/2023

Adaptive Sampling for Probabilistic Forecasting under Distribution Shift

The world is not static: This causes real-world time series to change ov...
research
11/29/2019

Location histogram privacy by sensitive location hiding and target histogram avoidance/resemblance (extended version)

A location histogram is comprised of the number of times a user has visi...
research
09/04/2018

Automated bird sound recognition in realistic settings

We evaluated the effectiveness of an automated bird sound identification...

Please sign up or login with your details

Forgot password? Click here to reset