A Scalable Shannon Entropy Estimator

06/02/2022
by   Priyanka Golia, et al.
0

We revisit the well-studied problem of estimating the Shannon entropy of a probability distribution, now given access to a probability-revealing conditional sampling oracle. In this model, the oracle takes as input the representation of a set S and returns a sample from the distribution obtained by conditioning on S, together with the probability of that sample in the distribution. Our work is motivated by applications of such algorithms in Quantitative Information Flow analysis (QIF) in programming-language-based security. Here, information-theoretic quantities capture the effort required on the part of an adversary to obtain access to confidential information. These applications demand accurate measurements when the entropy is small. Existing algorithms that do not use conditional samples require a number of queries that scale inversely with the entropy, which is unacceptable in this regime, and indeed, a lower bound by Batu et al.(STOC 2002) established that no algorithm using only sampling and evaluation oracles can obtain acceptable performance. On the other hand, prior work in the conditional sampling model by Chakraborty et al.(SICOMP 2016) only obtained a high-order polynomial query complexity, 𝒪(m^7/ϵ^8log1/δ) queries, to obtain additive ϵ-approximations on a domain of size 𝒪(2^m). We obtain multiplicative (1+ϵ)-approximations using only 𝒪(m/ϵ^2log1/δ) queries to the probability-revealing conditional sampling oracle. Indeed, moreover, we obtain small, explicit constants, and demonstrate that our algorithm obtains a substantial improvement in practice over the previous state-of-the-art methods used for entropy estimation in QIF.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/17/2018

Anaconda: A Non-Adaptive Conditional Sampling Algorithm for Distribution Testing

We investigate distribution testing with access to non-adaptive conditio...
research
10/08/2021

Bipartite Independent Set Oracles and Beyond: Can it Even Count Triangles in Polylogarithmic Queries?

Beame et al. [ITCS 2018] introduced and used the Bipartite Independent S...
research
07/19/2022

Identity Testing for High-Dimensional Distributions via Entropy Tensorization

We present improved algorithms and matching statistical and computationa...
research
11/22/2021

Sublinear quantum algorithms for estimating von Neumann entropy

Entropy is a fundamental property of both classical and quantum systems,...
research
11/20/2016

Dealing with Range Anxiety in Mean Estimation via Statistical Queries

We give algorithms for estimating the expectation of a given real-valued...
research
08/07/2016

A General Characterization of the Statistical Query Complexity

Statistical query (SQ) algorithms are algorithms that have access to an ...
research
01/13/2022

Faster Counting and Sampling Algorithms using Colorful Decision Oracle

In this work, we consider d-Hyperedge Estimation and d-Hyperedge Sample ...

Please sign up or login with your details

Forgot password? Click here to reset