A Simple Data Structure for Maintaining a Discrete Probability Distribution

02/11/2023
by   Daniel Allendorf, et al.
0

We revisit the following problem: given a set of indices S = {1, …, n} and weights w_1, …, w_n ∈ℝ_> 0, provide samples from S with distribution p(i) = w_i / W where W = ∑_j w_j gives the proper normalization. In the static setting, there is a simple data structure due to Walker called Alias Table that allows for samples to be drawn in constant time. A more challenging task is to maintain the distribution in a dynamic setting, where elements may be added or removed, or weights may change over time; here, existing solutions restrict the permissible weights, require rebuilding of the associated data structure after a number of updates, or are rather complex. In this paper, we describe, analyze, and engineer a simple data structure for maintaining a discrete probability distribution in the dynamic setting. Construction of the data structure for an arbitrary distribution takes time O(n), sampling takes expected time O(1), and updates of size Δ = O(W / n) can be processed in time O(1). To evaluate the efficiency of the data structure we conduct an experimental study. The results suggest that the dynamic sampling performance is comparable to the static Alias Table with a minor slowdown.

READ FULL TEXT
research
08/07/2019

Fully dynamic hierarchical diameter k-clustering and k-center

We develop dynamic data structures for maintaining a hierarchical k-cent...
research
11/04/2019

Nearly Optimal Static Las Vegas Succinct Dictionary

Given a set S of n (distinct) keys from key space [U], each associated w...
research
10/04/2019

Fully Dynamic (Δ+1)-Coloring in Constant Update Time

The problem of (vertex) (Δ+1)-coloring a graph of maximum degree Δ has b...
research
06/07/2023

Maintaining the cycle structure of dynamic permutations

We present a new data structure for maintaining dynamic permutations, wh...
research
07/16/2020

Dynamic Products of Ranks

We describe a data structure that can maintain a dynamic set of points g...
research
01/22/2018

Differential Message Importance Measure: A New Approach to the Required Sampling Number in Big Data Structure Characterization

Data collection is a fundamental problem in the scenario of big data, wh...
research
05/10/2023

Coding for IBLTs with Listing Guarantees

The Invertible Bloom Lookup Table (IBLT) is a probabilistic data structu...

Please sign up or login with your details

Forgot password? Click here to reset