Near-Optimal Model Discrimination with Non-Disclosure

12/04/2020
by   Dmitrii M. Ostrovskii, et al.
0

Let θ_0,θ_1 ∈ℝ^d be the population risk minimizers associated to some loss ℓ: ℝ^d ×𝒵→ℝ and two distributions ℙ_0,ℙ_1 on 𝒵. We pose the following question: Given i.i.d. samples from ℙ_0 and ℙ_1, what sample sizes are sufficient and necessary to distinguish between the two hypotheses θ^* = θ_0 and θ^* = θ_1 for given θ^* ∈{θ_0, θ_1}? Making the first steps towards answering this question in full generality, we first consider the case of a well-specified linear model with squared loss. Here we provide matching upper and lower bounds on the sample complexity, showing it to be min{1/Δ^2, √(r)/Δ} up to a constant factor, where Δ is a measure of separation between ℙ_0 and ℙ_1, and r is the rank of the design covariance matrix. This bound is dimension-independent, and rank-independent for large enough separation. We then extend this result in two directions: (i) for the general parametric setup in asymptotic regime; (ii) for generalized linear models in the small-sample regime n ≤ r and under weak moment assumptions. In both cases, we derive sample complexity bounds of a similar form, even under misspecification. Our testing procedures only access θ^* through a certain functional of empirical risk. In addition, the number of observations that allows to reach statistical confidence in our tests does not allow to "resolve" the two models – that is, recover θ_0,θ_1 up to O(Δ) prediction accuracy. These two properties allow to apply our framework in applied tasks where one would like to identify a prediction model, which can be proprietary, while guaranteeing that the model cannot be actually inferred by the identifying agent.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/14/2020

Optimal Testing of Discrete Distributions with High Probability

We study the problem of testing discrete distributions with a focus on t...
research
04/19/2014

Tight bounds for learning a mixture of two gaussians

We consider the problem of identifying the parameters of an unknown mixt...
research
06/13/2022

Near-Optimal Sample Complexity Bounds for Constrained MDPs

In contrast to the advances in characterizing the sample complexity for ...
research
11/19/2019

The Power of Factorization Mechanisms in Local and Central Differential Privacy

We give new characterizations of the sample complexity of answering line...
research
02/18/2017

Sample complexity of population recovery

The problem of population recovery refers to estimating a distribution b...
research
06/21/2022

Sharp Constants in Uniformity Testing via the Huber Statistic

Uniformity testing is one of the most well-studied problems in property ...
research
11/23/2022

A Moment-Matching Approach to Testable Learning and a New Characterization of Rademacher Complexity

A remarkable recent paper by Rubinfeld and Vasilyan (2022) initiated the...

Please sign up or login with your details

Forgot password? Click here to reset