Approximating a Target Distribution using Weight Queries

06/24/2020
by   Nadav Barak, et al.
0

A basic assumption in classical learning and estimation is the availability of a random sample from the target distribution. In domain adaptation this assumption is replaced with the availability of a sample from a source distribution, and a smaller or unlabeled sample from the target distribution. In this work, we consider a setting in which no random sampling from the target distribution is possible. Instead, given a large data set, it is possible to query the probability (weight) of a data point, or a set of data points, according to the target distribution. This can be the case when access to the target distribution is mediated, e.g., by specific measurements or by user relevance queries. We propose an algorithm for finding a reweighing of the data set which approximates the target distribution weights, using a limited number of target weight queries. The weighted data set may then be used in estimation and learning tasks, as a proxy for a sample from the target distribution. Given a hierarchical tree structure over the data set, which induces a class of weight functions, we prove that the algorithm approximates the best possible function, and upper bound the number of weight queries. In experiments, we demonstrate the advantage of the proposed algorithm over several baselines. A python implementation of the proposed algorithm and all experiments can be found at https://github.com/Nadav-Barak/AWP.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/30/2021

Continuous Weight Balancing

We propose a simple method by which to choose sample weights for problem...
research
01/27/2018

Variance-Optimal Offline and Streaming Stratified Random Sampling

Stratified random sampling (SRS) is a fundamental sampling technique tha...
research
02/21/2019

The Arboricity Captures the Complexity of Sampling Edges

In this paper, we revisit the problem of sampling edges in an unknown gr...
research
08/16/2022

Uncertainty-guided Source-free Domain Adaptation

Source-free domain adaptation (SFDA) aims to adapt a classifier to an un...
research
01/13/2023

Non-Stochastic CDF Estimation Using Threshold Queries

Estimating the empirical distribution of a scalar-valued data set is a b...
research
11/09/2022

Conformal Frequency Estimation with Sketched Data under Relaxed Exchangeability

A flexible method is developed to construct a confidence interval for th...
research
05/11/2020

Target-Independent Domain Adaptation for WBC Classification using Generative Latent Search

Automating the classification of camera-obtained microscopic images of W...

Please sign up or login with your details

Forgot password? Click here to reset