Non-Stochastic CDF Estimation Using Threshold Queries

01/13/2023
by   Princewill Okoroafor, et al.
0

Estimating the empirical distribution of a scalar-valued data set is a basic and fundamental task. In this paper, we tackle the problem of estimating an empirical distribution in a setting with two challenging features. First, the algorithm does not directly observe the data; instead, it only asks a limited number of threshold queries about each sample. Second, the data are not assumed to be independent and identically distributed; instead, we allow for an arbitrary process generating the samples, including an adaptive adversary. These considerations are relevant, for example, when modeling a seller experimenting with posted prices to estimate the distribution of consumers' willingness to pay for a product: offering a price and observing a consumer's purchase decision is equivalent to asking a single threshold query about their value, and the distribution of consumers' values may be non-stationary over time, as early adopters may differ markedly from late adopters. Our main result quantifies, to within a constant factor, the sample complexity of estimating the empirical CDF of a sequence of elements of [n], up to ε additive error, using one threshold query per sample. The complexity depends only logarithmically on n, and our result can be interpreted as extending the existing logarithmic-complexity results for noisy binary search to the more challenging setting where noise is non-stochastic. Along the way to designing our algorithm, we consider a more general model in which the algorithm is allowed to make a limited number of simultaneous threshold queries on each sample. We solve this problem using Blackwell's Approachability Theorem and the exponential weights method. As a side result of independent interest, we characterize the minimum number of simultaneous threshold queries required by deterministic CDF estimation algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/17/2018

Anaconda: A Non-Adaptive Conditional Sampling Algorithm for Distribution Testing

We investigate distribution testing with access to non-adaptive conditio...
research
11/04/2021

Pricing Query Complexity of Revenue Maximization

The common way to optimize auction and pricing systems is to set aside a...
research
10/03/2021

Active Learning for Contextual Search with Binary Feedbacks

In this paper, we study the learning problem in contextual search, which...
research
04/18/2016

Learning Sparse Additive Models with Interactions in High Dimensions

A function f: R^d →R is referred to as a Sparse Additive Model (SPAM), i...
research
08/16/2017

Adaptive Threshold Sampling and Estimation

Sampling is a fundamental problem in both computer science and statistic...
research
06/24/2020

Approximating a Target Distribution using Weight Queries

A basic assumption in classical learning and estimation is the availabil...
research
06/04/2018

In-depth comparison of the Berlekamp–Massey–Sakata and the Scalar-FGLM algorithms: the adaptive variants

The Berlekamp–Massey–Sakata algorithm and the Scalar-FGLM algorithm both...

Please sign up or login with your details

Forgot password? Click here to reset