Fast Gumbel-Max Sketch and its Applications

02/10/2023
by   Yuanming Zhang, et al.
0

The well-known Gumbel-Max Trick for sampling elements from a categorical distribution (or more generally a non-negative vector) and its variants have been widely used in areas such as machine learning and information retrieval. To sample a random element i in proportion to its positive weight v_i, the Gumbel-Max Trick first computes a Gumbel random variable g_i for each positive weight element i, and then samples the element i with the largest value of g_i+ln v_i. Recently, applications including similarity estimation and weighted cardinality estimation require to generate k independent Gumbel-Max variables from high dimensional vectors. However, it is computationally expensive for a large k (e.g., hundreds or even thousands) when using the traditional Gumbel-Max Trick. To solve this problem, we propose a novel algorithm, FastGM, which reduces the time complexity from O(kn^+) to O(k ln k + n^+), where n^+ is the number of positive elements in the vector of interest. FastGM stops the procedure of Gumbel random variables computing for many elements, especially for those with small weights. We perform experiments on a variety of real-world datasets and the experimental results demonstrate that FastGM is orders of magnitude faster than state-of-the-art methods without sacrificing accuracy or incurring additional expenses.

READ FULL TEXT
research
02/02/2020

Fast Generating A Large Number of Gumbel-Max Variables

The well-known Gumbel-Max Trick for sampling elements from a categorical...
research
09/14/2018

Canonical spectral representation for exchangeable max-stable sequences

The set of infinite-dimensional, symmetric stable tail dependence functi...
research
02/04/2023

An Effective and Differentially Private Protocol for Secure Distributed Cardinality Estimation

Counting the number of distinct elements distributed over multiple data ...
research
05/31/2018

Simulation of Random Variables under Rényi Divergence Measures of All Orders

The random variable simulation problem consists in using a k-dimensional...
research
10/06/2019

Kernel Density Estimation for Totally Positive Random Vectors

We study the estimation of the density of a totally positive random vect...
research
04/01/2021

Fast Jacobian-Vector Product for Deep Networks

Jacobian-vector products (JVPs) form the backbone of many recent develop...
research
04/24/2020

Differential Network Learning Beyond Data Samples

Learning the change of statistical dependencies between random variables...

Please sign up or login with your details

Forgot password? Click here to reset