DeepAI AI Chat
Log In Sign Up

A Minimax Theory for Adaptive Data Analysis

by   Yu-Xiang Wang, et al.

In adaptive data analysis, the user makes a sequence of queries on the data, where at each step the choice of query may depend on the results in previous steps. The releases are often randomized in order to reduce overfitting for such adaptively chosen queries. In this paper, we propose a minimax framework for adaptive data analysis. Assuming Gaussianity of queries, we establish the first sharp minimax lower bound on the squared error in the order of O(√(k)σ^2/n), where k is the number of queries asked, and σ^2/n is the ordinary signal-to-noise ratio for a single query. Our lower bound is based on the construction of an approximately least favorable adversary who picks a sequence of queries that are most likely to be affected by overfitting. This approximately least favorable adversary uses only one level of adaptivity, suggesting that the minimax risk for 1-step adaptivity with k-1 initial releases and that for k-step adaptivity are on the same order. The key technical component of the lower bound proof is a reduction to finding the convoluting distribution that optimally obfuscates the sign of a Gaussian signal. Our lower bound construction also reveals a transparent and elementary proof of the matching upper bound as an alternative approach to Russo and Zou (2015), who used information-theoretic tools to provide the same upper bound. We believe that the proposed framework opens up opportunities to obtain theoretical insights for many other settings of adaptive data analysis, which would extend the idea to more practical realms.


page 1

page 2

page 3

page 4


Testing convexity of functions over finite domains

We establish new upper and lower bounds on the number of queries require...

Linear Models are Most Favorable among Generalized Linear Models

We establish a nonasymptotic lower bound on the L_2 minimax risk for a c...

Challenges in Bayesian Adaptive Data Analysis

Traditional statistical analysis requires that the analysis process and ...

Minimax Optimal Quantization of Linear Models: Information-Theoretic Limits and Efficient Algorithms

We consider the problem of quantizing a linear model learned from measur...

Exact minimax risk for linear least squares, and the lower tail of sample covariance matrices

The first part of this paper is devoted to the decision-theoretic analys...

Adaptive Data Analysis in a Balanced Adversarial Model

In adaptive data analysis, a mechanism gets n i.i.d. samples from an unk...

The Everlasting Database: Statistical Validity at a Fair Price

The problem of handling adaptivity in data analysis, intentional or not,...