Optimal Dynamic Subset Sampling: Theory and Applications

05/30/2023
by   Lu Yi, et al.
0

We study the fundamental problem of sampling independent events, called subset sampling. Specifically, consider a set of n events S={x_1, …, x_n}, where each event x_i has an associated probability p(x_i). The subset sampling problem aims to sample a subset T ⊆ S, such that every x_i is independently included in S with probability p_i. A naive solution is to flip a coin for each event, which takes O(n) time. However, the specific goal is to develop data structures that allow drawing a sample in time proportional to the expected output size μ=∑_i=1^n p(x_i), which can be significantly smaller than n in many applications. The subset sampling problem serves as an important building block in many tasks and has been the subject of various research for more than a decade. However, most of the existing subset sampling approaches are conducted in a static setting, where the events or their associated probability in set S is not allowed to be changed over time. These algorithms incur either large query time or update time in a dynamic setting despite the ubiquitous time-evolving events with changing probability in real life. Therefore, it is a pressing need, but still, an open problem, to design efficient dynamic subset sampling algorithms. In this paper, we propose ODSS, the first optimal dynamic subset sampling algorithm. The expected query time and update time of ODSS are both optimal, matching the lower bounds of the subset sampling problem. We present a nontrivial theoretical analysis to demonstrate the superiority of ODSS. We also conduct comprehensive experiments to empirically evaluate the performance of ODSS. Moreover, we apply ODSS to a concrete application: influence maximization. We empirically show that our ODSS can improve the complexities of existing influence maximization algorithms on large real-world evolving social networks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/21/2023

Subset Sampling and Its Extensions

This paper studies the subset sampling problem. The input is a set 𝒮 of ...
research
02/07/2018

Optimal data structures for stochastic driven simulations

Simulations where we have some prior information on the probability dist...
research
03/19/2019

Independent Range Sampling, Revisited Again

We revisit the range sampling problem: the input is a set of points wher...
research
04/17/2021

Budgeted Influence and Earned Benefit Maximization with Tags in Social Networks

Given a social network, where each user is associated with a selection c...
research
10/28/2021

Better Sum Estimation via Weighted Sampling

Given a large set U where each item a∈ U has weight w(a), we want to est...
research
11/28/2018

Attendance Maximization for Successful Social Event Planning

Social event planning has received a great deal of attention in recent y...
research
01/28/2023

Leveraging Importance Weights in Subset Selection

We present a subset selection algorithm designed to work with arbitrary ...

Please sign up or login with your details

Forgot password? Click here to reset